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AN  INVESTIGATION  OF  KARMARKAR’S  ALGORITHM 
FOR  LINEAR  PROGRAMMING 

SUMMARY  FINAL  REPORT 


1.0  INTRODUCTION 

1.1  SCOPE 

This  report  summarizes  the  research  on  potential  extensions  and  applications  of 
the  Karmarkar  linear  programming  concept  that  was  conducted  by  Decision-Science 
Applications,  Inc.  (DSA)  for  the  Office  of  Naval  Research  (ONR)  under  Contract 
N00014-85-C-0254.  The  DSA  research  effort  was  initiated,  with  corporate  funds, 
immediately  after  the  publication  of  the  original  Karmarkar  results,  and  has  been 
continued  with  support  provided  by  the  Office  of  Naval  Research  under  the  above 
contract. 


1.2  BACKGROUND  AND  OBJECTIVES 

The  testing  and  assessment  of  the  Karmarkar  algorithm  by  the  mathematical 
community  has  made  it  clear  both  that  the  algorithm  has  a  great  deal  of  promise,  and 
that  the  Karmarkar  methods  will  require  substantial  additional  development  if  they  are 
to  displace  the  simplex  method  as  a  standard  technique  for  solving  large-scale  linear 
programming  problems.  As  a  result,  considerable  attention  is  being  given  to  the 
development  of  more  efficient  methods  for  accomplishing  the  large  matrix  inversion 
operation  which  accounts  for  the  bulk  of  the  computational  effort  within  the  Karmarkar 
algorithm. 

The  present  research  effort  has  been  focused  somewhat  differently  on  the 
possibilities  for  synergy  between  the  Karmarkar  concepts  and  the  iterative  Lagrangian 
optimization  methods  for  large  scale  non-linear  applications,  which  have  been  a  DSA 
specialty.  One  remarkable  aspect  of  the  Karmarkar  algorithm  is  that  it  defines  an 
iterative  computational  approach  for  linear  optimization  that  is  similar  in  many  ways  to 
the  iterative  methods  that  are  used  in  large-scale  non-linear  optimization  applications. 
Thus,  it  seemed  likely  that  the  potential  synergy  between  these  two  sets  of  optimization 
concepts  could  lead  to  new  developments  that  might  be  more  broadly  applicable  in  the 
general  field  of  mathematical  optimization. 

For  this  reason,  the  first  objective  of  the  DSA  research  effort  has  been  to 
identify  and  clarify  the  essential  computational  principles  that  underlie  the  Karmarkar 
algorithm--with  the  objective  of  making  them  available  as  general  purpose  mathematical 
tools  that  can  be  more  widely  applied  to  mathematical  programming  problems  outside 
the  specific  context  of  the  Karmarkar  algorithm. 

The  second  objective  of  the  DSA  research  has  been  to  search  for  possible 
alternatives  to  the  Karmarkar  linear  programming  algorithm  that  might  preserve  its  main 
computational  advantages  while  avoiding  entirely  the  computationally  inefficient  matrix 
inversion  process. 

We  believe  that  substantial  progress  has  been  made  with  regard  to  both  of  the 
above  objectives. 
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1.3  SYNOPSIS  OF  PUBLISHABLE  RESEARCH  FINDINGS 


Analysis  of  Computational  Principles 

The  DSA  research  has  shown  that  the  Karmarkar  algorithm  is  mathematically 
equivalent  to  a  simple  steepest  descent  algorithm--in  which  "distance"  is  defined  in  terms 
of  a  logarithmic  distance  metric  defined  over  the  original  problem  variables,  and  in 
which  the  actual  objective  function  is  replaced  by  a  surrogate  objective  function  which 
includes  a  penalty  function  that  facilitates  boundary  avoidance. 

These  basic  findings  are  reported  in  A  Functional  Analysis  of  the  Karmarkar 
Linear  Programming  Algorithm ,  George  E.  Pugh,  DSA  Report  #676,  March  31,  1986 
(ref.  3.).  This  document,  which  is  being  transmitted  with  this  final  report,  is  also  being 
distributed  to  the  attendees  at  the  ONR  Monterey  Conference  (February  20-21,  1986)  on 
the  Karmarkar  algorithm.  It  is  expected  that  the  main  findings  of  the  report  will  be 
prepared  for  publication  in  a  mathematical  journal. 

Non-Matrix  Inversion  Alternatives 

The  development  of  alternative  versions  of  the  Karmarkar  algorithm  that  can 
operate  without  matrix  inversion  appears  to  require  two  important  departures  from  the 
basic  Karmarkar  algorithm.  First,  the  basic  algorithm  must  be  restructured  into  a  form 
that  is  less  sensitive  to  inaccuracies  in  the  calculated  direction  of  steepest  descent,  so  that 
approximation  methods  can  be  used  in  place  of  an  exact  matrix  inversion.  Second, 
computationally  efficient  and  reliable  methods  for  estimating  the  direction  of  steepest 
descent  without  matrix  inversion  must  be  developed. 

A  promising  approximation  method  which  was  developed  for  estimating  the 
direction  of  steepest  descent  without  matrix  inversion  is  described  in  another  draft  paper 
by  John  Danskin,  "A  Geometric  Algorithm  for  Approximate  Steepest  Ascent  with  Linear 
Constraints,"  which  is  included  as  Enclosure  A  to  this  report. 

One  very  promising  approach  which  was  developed  to  reduce  the  requirement 
for  precision  in  the  estimated  direction  of  steepest  descent  is  described  in  a  draft  paper 
by  John  Danskin,  "A  Variant  of  Karmarkar’s  Linear  Programming  Algorithm  not 
Requiring  Projection  onto  the  Null  Space,"  which  is  included  as  Enclosure  B  to  this 
report. 


1.4  OTHER  RESEARCH  FINDINGS. 

As  is  usually  the  case  in  any  exploratory  research  effort,  the  publishable 
material  represents  only  a  small  percentage  of  the  total  effort,  and  some  of  the  most 
interesting  findings  have  not  yet  reached  a  point  of  development  that  would  justify 
formal  publication.  The  remainder  of  this  report  provides  a  broad  overview  of  the 
research  effort,  and  outlines  some  of  the  new  insights  which  will  need  to  be  pursued  in 
a  continuation  of  the  research  effort. 


2.0 


TECHNICAL  PROBLEMS  WITH  THE  KARMARKAR  ALGORITHM 


About  two  years  ago,  Narendra  Karmarkar  proposed  a  radical  new  method  for 
solving  linear  programming  problems.  The  method  involved  the  use  of  a  projective 
transformation  to  map  the  space  in  which  the  problem  was  originally  defined  onto  a 
simplex,  in  such  a  way  that  the  current  position  point  is  mapped  into  the  barycenter. 

This  method  caused  a  sensation  in  the  linear  and  non-linear  programming 
communities.  It  seemed  to  be  extremely  fast  on  some  problems,  but  extremely  slow  on 
others.  There  were  difficult  technical  questions  as  to  how  to  achieve  feasibility  and  how 
to  efficiently  carry  out  the  unfamiliar  matrix  ii  'ersion  that  Karmarkar  requires;  and 
questions  as  to  the  adequacy  of  the  proof  of  convergence.  But  it  is  unquestionable  that 
Karmarkar’s  basic  idea  is  one  of  great  power. 

The  main  difficulty  with  the  algorithm  has  been  with  the  calculation  of  a 
direction  of  steepest  descent  within  the  simplex,  which  is  required  for  each  move.  To 
accomplish  this,  Karmarker  projects  the  objective  vector  into  the  "null  space"  defined  by 
the  constraints  and  the  simplex  equation.  He  accomplishes  this  projection  algebraically 
by  inverting  a  matrix. 

There  are  many  difficulties  in  inverting  large  matrices.  J.A.  Tomlin  in  "An 
Experimental  Approach  to  Karmarkar’s  Projective  Method  for  Linear  Programming," 
which  is  included  as  Enclosure  C,  writes  of  problems  encountered  with  Karmarkar’s 
matrices.  Their  "condition"  worsens  as  the  iteration  proceeds;  also,  any  known  inversion 
technique  will  decrease  the  sparsity  of  the  matrix. 

There  are  other  problems.  Computational  inverses  are  always  imperfect,  and 
the  Karmarkar  algorithm  is  sensitive  to  computational  inaccuracies.  Without  an  exact 
inverse,  the  position  vector  will  wander  away  from  feasibility.  Another  difficulty  is  that 
of  achieving  an  initial  feasible  point  in  the  form  Karmarkar  wants  it,  with  all  the 
coordinates  positive. 

Finally,  the  Karmarkar  proof  of  convergence  hinges  fundamentally  on  knowing 
the  value  of  the  function  at  the  optimum.  Karmarkar  dealt  with  this  problem  in  his 
original  Bell  Laboratory  paper  (ref.  1.)  by  using  a  rather  awkward  procedure  to  estimate 
the  optimum  value  as  the  algorithm  progressed.  But  he  offered  no  proof  of  convergence 
for  this  procedure.  In  his  revised  paper  in  Combinatorica  (ref.  2.),  he  solves  the 
problem  by  using  a  combined  primal-dual  problem  definition  in  which  the  value  of  the 
combined  objective  function  at  the  solution  point  can  be  guaranteed  to  be  zero.  This 
approach  provides  a  nice  solution  for  the  proof  of  convergence  at  the  expense  of  using  a 
much  larger  matrix  in  the  computational  process. 


i 
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3.0  OVERVIEW  OF  DSA  RESEARCH  EFFORT 


I 


3.1  SPECIFIC  RESEARCH  OBJECTIVES 

The  DSA  effort  focused  on  two  major  objectives. 

The  first  objective  was  to  develop  an  improved  understanding  of  the  practical 
computational  concepts  that  account  for  the  performance  of  the  Karmarkar  algorithm- - 
so  that  the  concepts  could  be  effectively  utilized  in  the  field  of  mathematical 
optimization  outside  the  context  of  Karmarkar’s  specific  algorithm.  This  objective  has 
been  essentially  achieved,  and  the  results  are  reported  in  DSA  Report  #676  (ref.  3). 


The  second  objective  was  concerned  with  the  development  and  study  of 
possible  variants  of  the  Karmarkar  algorithm,  with  the  objective  of  improving  the  speed 
and  reliability  of  the  algorithm.  This  part  of  the  DSA  effort  had  several  specific 
objectives.  These  included  the  development  of: 

•  Approximate  computational  methods  for  finding  steepest  ascent  that  could 
be  more  efficient  than  matrix  inversion,  particularly  because  many  such 
methods  preserve  matrix  sparseness. 


•  Alternative  algorithms  for  utilizing  the  Karmarkar  steepest  descent 
concept  that  are  more  tolerant  of  computational  inaccuracy  and  departures 
from  the  precise  null  space  than  the  original  Karmarkar  formulation. 

•  Alternative  transformations  of  the  original  problem  for  accomplishing  the 
same  functional  objective  as  the  Karmarkar  projection,  but  that  are 
mathematically  simpler  and  possibly  computational!;  tore  efficient. 


3.2  THE  RESEARCH  TEAM 

To  facilitate  interactions  between  the  new  Karmarkar  concepts  and  earlier 
iterative  techniques  developed  for  non-linear  programming  problems,  DSA  brought 
together  a  combination  of  theoretical  and  applied  mathematical  experience. 

John  M.  Danskin,  who  is  known  for  his  extensive  contributions  to  mathematical 
optimization  and  game  theory,  has  served  as  the  theoretical  mathematician  for  the 
project.  He  was  responsible  for  the  original  theoretical  insights  concerning  the 
Karmarkar  algorithm  that  led  DSA  to  initiate  the  project,  and  he  has  contributed  a 
substantial  fraction  of  the  theoretical  concepts  that  have  been  investigated.  In  addition 
to  his  theoretical  contributions,  Danskin  has  also  developed  numerous  Fortran  programs 
on  an  IBM  PC  that  he  has  used  to  test  some  of  his  theoretical  concepts. 

Most  of  the  applied  research  for  the  project,  including  computer  validation  and 
large-scale  testirg  of  algorithms,  has  been  conducted  by  Fred  Miercort,  who  has  more 
than  20  years  of  experience  in  applied  mathematics  and  optimization  methods.  Miercort 
was  responsible  for  most  of  the  initial  testing  of  variants  of  the  Karmarkar  concept  that 
provided  the  foundation  for  a  more  theoretical  functional  understanding  of  the 
algorithm. 

The  project  was  supervised  by  George  F.  Pugh,  who  has  many  years  of 
practical  experience  in  solving  very  large  non-linear  optimization  problems.  In  addition 
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to  his  role  as  supervisor  for  the  project,  Pugh  was  also  responsible  for  developing  the 
functional  understanding  of  the  algorithm  as  reflected  in  DSA  Report  #676  (ref.  3). 
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4.0  SUMMARY  OF  RESEARCH  BY  JOHN  M.  DANSKIN 


4.1  EARLY  RESEARCH  CONTRIBUTIONS 

Because  of  the  promise  of  the  Karmarkar  approach  and  the  numerous  problems 
with  the  current  Karmarkar  formulation,  Danskin  began  in  November  1984  to  seek 
methods  for  avoiding  the  exact-inverse,  destruction  of  sparseness,  and  feasibility 
problems.  He  interested  DSA  in  a  joint  research  program  on  the  topic,  and  continued 
with  his  investigations.  Danskin  made  a  number  of  very  important  early  contributions  to 
the  research  effort. 

•  He  provided  a  detailed  assessment  and  interpretation  of  the  Karmarkar 
concept  that  provided  the  foundation  for  much  of  the  DSA  effort. 

•  He  suggested  the  use  of  the  Lemke  closest  point  principle  as  a  way  of 
converting  the  calculation  of  a  direction  of  steepest  ascent  into  a 
quadratic  distance  minimization  problem. 

•  He  developed  a  number  of  approximate  methods  for  solving  the  quadratic 
minimization  problem,  including  a  method  of  "maximum  scoop"  that  has 
performed  as  well  as,  or  better  than,  any  other  method  that  was  tested  by 
Miercort. 

•  He  has  explored  several  ways  of  utilizing  primal-dual  relationships  to 
restructure  standard  LP  problems  so  as  to  avoid  some  of  the  theoretical 
and  computational  problems  of  the  Karmarkar  approach. 


4.2  PRESENT  DANSKIN  APPROACH 

By  June  1985,  Danskin,  after  numerous  excursions,  arrived  at  a  "balanced 
reduction  of  deficits"  approach  which  seems  particularly  promising.  In  this  approach, 
which  is  explained  in  sections  4  and  5  of  Enclosure  B,  he  uses  the  Karmarkar  simplex 
and  projective  transformations,  but  otherwise  departs  radically  from  Karmarkar. 

The  approach  never  requires  a  feasible  point.  It  starts,  instead,  with  an 
arbitrary  positive  primal,  dual  pair  (X,Y)  which  satisfies  the  "connecting  equation" 

C  X  =  B  Y,  but  which  is  otherwise  infeasible  relative  to  many  of  the  primal  and  dual 
inequality  requirements. 

The  magnitude  of  the  infeasibilities,  measured  positively,  are  referred  to  as 
"deficits."  It  may  happen  that  the  "deficits"  corresponding  to  some  of  the  inequalities  are 
negative.  Additive  "slack"  variables,  somewhat  analogous  to  the  slack  variables  of  the 
simplex  method,  are  then  introduced  so  as  to  make  all  the  deficits  positive  and  equal. 

Starting  with  the  deficits  all  equal,  a  new  dummy  variable,  W,  is  introduced 
which  is  equal  to  them.  The  algorithm  then  seeks  to  find  a  way  down  for  all  the 
deficits  which  keeps  them  all  approximately  equal.  On  succeeding  moves,  any  small 
deviations  from  the  equality  balance  are  partially  corrected  in  such  a  way  that,  as  W 
decreases  towards  zero,  the  deficits  continue  to  remain  within  25%  of  W.  At  no  point  is 
extreme  accuracy  required. 
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The  method  never  requires  a  feasible  point,  and  it  obviates  the  need  for  an 
accurate  projection.  In  a  certain  technical  sense,  the  method  is  "projecting,"  but  not  in 
Karmarkar’s  sense. 

The  key  to  the  success  of  such  an  approach,  of  course,  lies  in  the  availability  of 
a  computationally  efficient  method  for  estimating  a  direction  that  will  maintain  an 
approximate  balance  of  the  deficits  during  the  convergence  process.  At  the  time  the 
paper  in  Enclosure  B  was  written,  Danskin  was  using  a  new  geometric  (rather  than 
algebraic)  algorithm  for  steepest  ascent,  subject  to  equality  constraints,  which  is  included 
here  as  Enclosure  A. 

This  algorithm  works  by  always  having  a  rate  of  ascent  greater  than  the 
steepest  constrained  rate,  and  by  gradually  reducing  the  violations  of  the  constraints. 
When  this  steepest  ascent  method  is  applied  to  the  balanced  reduction  algorithm, 
directions  are  accepted  as  good  even  if  the  balanced  reduction  of  constraints  is  violated 
by  some  moderate  amount  such  as  25%.  The  resulting  error  is  corrected  in  the  next 
move. 


More  recently,  Danskin  has  developed  an  approach  which  seems  to  perform 
much  more  efficiently,  which  is  based  on  the  application  of  a  game  solution  concept 
which  Danskin  identifies  as  a  "stack  game."  Regardless  of  what  method  is  used  to 
estimate  a  direction  of  balanced  reduction,  the  method  has  the  major  advantage  that  it 
requires  no  operations  at  all  on  the  matrix  other  than  row  multiplications  and,  thus,  it 
completely  preserves  the  sparsity  of  the  matrix. 

The  two  papers  discussed  above  have  been  written  under  the  current  ONR 
contract.  The  corresponding  computer  programs,  with  some  of  the  more  recent 
techniques  for  estimating  the  move  direction,  have  been  used  to  solve  linear 
programming  problems  with  several  thousand  non-zero  entries  on  an  IBM  personal 
computer — one  that  is  not  even  equipped  with  hardware  floating  point  capability. 


4.3  A  FAST  POLYNOMIAL  METHOD  FOR  REDUCING  THE  DEFICITS 

Subsequent  to  the  completion  of  the  work  on  this  contract  Danskin  has  found  a 
new  method  for  finding  a  direction  which  drives  the  deficits  down  at  exactly  the  same 
rate.  The  method  takes  at  most  q  -  1  steps,  where  q  is  the  number  of  constraints. 
Each  step  requires  pqe  multiplications,  where  e  is  the  density  of  the  non-zero 
elements.  The  method  is  an  outgrowth  of  the  approach  given  in  section  2  of  Enclosure 
B,  but  differs  in  the  methods  of  solution  defined  thereafter. 

It  has  now  been  incorporated  in  a  running  LP  routine  which  has  been 
developed  on  DSA’s  and  Danskin’s  own  time.  An  early  version  is  included  as 
Enclosure  D.  At  the  time  that  this  draft  was  written,  Danskin  had  not  yet  discovered 
that  it  is  not  necessary,  in  forming  a  direction  for  a  move,  to  subtact  off  the  projections 
on  all  the  previous  directions.  One  need  only  subtract  off  the  projection  on  the  last 
previous  move.  This  discovery  eliminated  a  factor  of  q/2,  the  required  number  of 
multiplications,  and  is  the  basis  for  the  possibility  that  the  new  method  will  be  really 
competitive  in  practice. 
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4.4  OTHER  DANSKIN  RESEARCH 

Danskin  has  recently  begun  work  on  the  application  of  these  concepts  to 
quadratic  programming.  He  hopes  to  apply  the  balanced-reduction-of-deficits  method 
to  provide  an  iterative  method  for  the  solution  of  large  quadratic  programming  problems. 
He  has  found  a  theorem  which  appears,  after  checking  with  leading  experts,  to  be  new, 
on  "pseudo-duality,"  which  makes  it  possible  to  convert  a  quadratic  programming 
problem  with  linear  constraints  into  a  form  that  can  be  addressed  by  methods  very 
similar  to  those  outlined  above. 
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5.0  EXPLORATORY  RESEARCH  BY  MIERCORT  AND  PUGH 


This  exploratory  research  has  focused  on  four  major  topics. 


5.1  SIMPLIFICATION  OF  KARMARKAR  TRANSFORMATION 

The  transformation  Karmarkar  uses  to  map  the  original  LP  problem  onto  a 
simplex  would  be  linear  except  for  his  use  of  a  denominator  that  makes  the 
transformation  non-linear.  The  presence  of  the  denominator  seems  to  introduce  many 
complications  into  the  algorithm.  For  example,  it  appears  to  be  an  important  factor  in 
motivating  Karmarkar  to  work  with  a  complicated  surrogate  objective  function. 

From  an  analysis  of  the  effects  of  the  distance  metric  on  the  direction  of  the 
Karmarkar  moves,  Pugh  became  convinced  that  Karmarkar’s  denominator  (which  was 
necessary  for  his  proof  of  convergence)  is  not  essential  to  the  operation  of  the  algorithm. 
To  test  this  conjecture,  Miercort  developed  and  tested  a  simplified  algorithm  which  does 
not  use  a  surrogate  objective  function,  and  which  omits  the  denominator  from  all 
calculations. 

Although  no  implementation  of  the  full  Karmarkar  algorithm  was  available  at 
that  time  with  which  we  could  make  direct  comparisons,  the  operation  of  the  simplified 
algorithm  seemed  indistinguishable  from  what  had  been  reported  by  other  investigators 
using  the  full  Karmarkar  transformation.  About  the  same  number  of  moves  were 
required  to  achieve  convergence  on  comparable  problems.  The  trade-off  between  the 
number  of  moves  and  the  size  of  a  move  was  the  same  as  reported  by  other 
investigators,  and  the  same  types  of  problems  were  encountered  with  the  matrix 
inversions  required  to  solve  the  steepest  ascent  sub-problem.  We  learned  from  the  ONR 
Monterey  meeting  that  we  had  not  been  alone  in  this  approach.  One  speaker  there  listed 
more  than  a  dozen  similar  "affine  transformation"  approaches. 

Subsequent  research  (ref.  3)  has  shown  that  the  omission  of  the  surrogate 
objective  function  does  somewhat  reduce  the  efficiency  of  the  Karmarkar  algorithm. 

But  the  differences  in  the  performance  were  too  small  to  be  apparent  at  that  time 
without  a  direct  side-by-side  comparison.  The  subsequent  research  has  also  suggested, 
however,  that  the  use  of  the  simplified  distance  metric  without  the  denominator  may 
actually  be  somewhat  more  efficient  than  the  full  Karmarkar  distance  metric. 


5.2  A  TEST  BED  TO  INVESTIGATE  COMPUTATIONAL  ACCURACY 

PROBLEMS 

DSA’s  experimental  work  with  various  forms  of  this  simplified  algorithm 
focused  on  number  of  test  problems,  ranging  from  only  a  few  variables  to  several 
hundred.  Tests  were  made  using  both  sparse  matrices,  that  are  characteristic  of  most 
commercial  linear  programming,  and  dense  matrices  that  are  common  in  game  theory 
optimization  problems. 

Since  the  basic  Karmarkar  approach  involves  an  iterative  computational 
algorithm,  one  might  reasonably  expect  that  it  would  be  self-correcting  and  tolerant  of 
computational  inaccuracy.  Indeed,  Karmarkar  himself,  in  his  original  paper,  made  just 
such  a  claim.  But  the  experience  of  all  other  investigators  has  been  that  very  high 
precision  is  required  in  the  matrix  inversion  to  avoid  drifting  away  from  the  feasible 
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null  space  in  a  succession  of  the  prescribed  Karmarkar  moves.  Moreover,  such  drift  is 
not  self-correcting  in  the  basic  algorithm,  and  the  difficulty  becomes  more  severe  as  the 
algorithm  approaches  the  solution  point. 

To  facilitate  an  investigation  of  problems  in  computational  accuracy,  the  "test- 
bed  algorithm"  was  designed  so  that  iterative  approximate  methods  could  be  used 
interchangeably  with  the  more  standard  matrix  inversion  methods.  Moreover,  the  test¬ 
bed  was  so  structured  that  the  performance  of  each  iterative  method  could  be  measured 
relative  to  the  more  accurate  matrix  inversion  results. 

Two  basic  types  of  test  problems  were  used  in  the  analysis.  With  the 
cooperation  of  KETRON  Inc.,  a  company  that  is  very  active  in  commercial  linear 
programming,  several  typical  test  problems  have  been  obtained,  ranging  from  about  10 
to  about  300  variables.  Like  most  commercial  linear  programming  problems,  they  tend 
to  be  quite  sparse.  But  they  are  also  problems  that  include  pitfalls  that  can  be 
troublesome  for  simple  linear  programming  codes.  The  second  basic  type  of  problem 
arises  in  the  solution  of  game  theory  problems.  We  have  concentrated  particularly  on  the 
so-called  "cookie-cutter"  game,  whose  matrix  is  generally  quite  dense.  The  formulation 
of  this  game  allows  an  easy  selection  of  a  wide  variety  of  problems  of  varying  size 
whose  solution  is  very  non-trivial. 


S.3  EVALUATION  OF  ALTERNATIVE  ITERATIVE  ALGORITHMS 

We  have  since  assessed  the  performance  of  a  wide  variety  of  iterative 
algorithms.  So  far,  none  of  the  methods  tested  has  performed  well  enough  to  be  a 
serious  competitor  to  the  matrix  inversion  method,  except  perhaps  on  problems  much 
larger  than  those  we  have  used  in  the  experimental  work. 

Almost  all  of  the  iterative  methods  that  have  been  tested  utilize  versions  of  the 
Lemke  principle,  which  converts  the  steepest  ascent  problem  into  a  quadratic  distance 
minimization  problem.  This  method  tries  to  find  that  linear  combination  of  the 
constraint  vectors  which  most  accurately  approximates  the  objective  function.  The 
vector  coefficients  that  are  produced  in  this  way  are  analogous  to  the  Lagrange 
multipliers  that  are  encountered  in  non-linear  programming  problems.  As  the  algorithm 
approaches  the  solution  point,  these  variables  converge  to  the  values  of  the  dual 
variables. 

The  methods  explored  include: 

1.  Well-known  heuristic  methods,  due  to  Hugh  Everett,  that  are  used  to 
adjust  the  Lagrange  multipliers  for  non-linear  programming  problems;1 

2.  A  simple  steepest  descent  algorithm; 

3.  Various  steepest  descent  algorithms,  altered  to  include  heuristic  learning 
mechanisms; 


These  heuristic  method*  (imply  raise  a  multiplier  by  lome  factor,  1+  a,  if  the  constraint  is  violated  in  an 
unconstrained  Lagrange  optimization,  and  lower  it  by  a  comparable  factor,  1-  0  ,  if  the  constraint  is  slack.  The 
factors,  a  and  0  ,  by  which  a  given  multiplier  is  adjusted  are  arbitrarily  increased  when  successive  adjustments  are 
in  the  same  direction  and  are  is  decreased  when  a  reversal  of  direction  occurs. 
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4.  A  steepest  quadratic  descent  algorithm  developed  by  John  Danskin,  which 
uses  a  curved  (i.e.,  quadratic)  trajectory  to  correct  for  the  first  order 
changes  in  the  direction  of  steepest  descent;  and 

5.  A  "maximum  scoop"  minimization  algorithm  (also  developed  by  Danskin), 
which  uses  a  steepest  descent  method,  in  the  space  of  directions,  to  search 
for  a  direction  vector  with  the  lowest  trajectory  minimum. 


Most  of  these  algorithms  have  also  been  tested  on  renormalized  representations  of  the 
linear  programming  problem  (in  which  the  units  of  the  constraints  and  the  activities 
have  been  modified  to  provide  a  natural  distance  metric  for  defining  a  direction  of 
steepest  descent). 

We  have  not  had  the  resources  under  this  contract  to  test  some  of  Danskin’s 
more  recent  direction-finding  algorithms,  such  as  the  algorithm  described  in  Enclosure 
A  or  his  newest  algorithm,  which  has  generated  exceptionally  good  performance  on  an 
IBM  PC.  This  newest  algorithm,  which  looks  promising,  utilizes  a  two-level  scoop 
method  in  combination  with  the  "stack  game"  optimization  concept. 

With  the  possible  exception  of  the  most  recent  "double  scoop"  concept,  none  of 
the  above  algorithms  has  yielded  the  uniformly  reliable  performance  on  all  of  the  test 
problems  that  would  be  desirable  in  a  standardized  optimization  program.  However,  the 
basic  structure  of  the  direction-  finding  problem  is  sufficiently  predictable  to  justify  the 
expectation  that  a  very  satisfactory  iterative  method  can  be  identified  that  can  be 
efficiently  used  in  combination  with  an  alternative  formulation  of  the  Karmarkar 
algorithm,  and  that  is  less  sensitive  to  inaccuracies  in  the  direction-finding  process. 
Clearly,  additional  research  on  this  topic  is  badly  needed. 

In  this  connection,  we  note  that,  as  this  report  is  being  prepared,  Danskin  has 
found  a  polynomial  algorithm  for  an  "exact  way  down"  without  matrix  inversion.  We 
are,  however,  submitting  this  report  before  that  algorithm,  found  on  "own  time,"  has 
been  tested. 


5.4  METHODS  TO  REDUCE  VULNERABILITY  TO  COMPUTATIONAL 

INACCURACIES 

This  section  discusses  some  of  our  approaches  intended  to  reduce  the 
vulnerability  of  the  Karmarkar  approach  to  computational  inaccuracies. 

Advance  Solution  of  Equality  Constraints.  The  difficulty  of  finding  move 
directions  that  can  preserve  feasibility  is  greatly  exacerbated  by  the  presence  of  equality 
constraints.  In  commercial  linear  programming  codes,  it  has  become  common  practice  to 
eliminate  the  equality  constraints  by  solving  the  equations  in  advance.  When  any 
variables  associated  with  equality  constraints  are  eliminated,  one  may  of  course  need  to 
introduce  additional  inequality  constraints  to  confine  the  solution  space  to  regions  where 
the  eliminated  variables  have  positive  values. 

From  our  preliminary  tests,  it  appears  that  such  a  removal  of  the  equality 
constraints  may  be  even  more  important  for  a  Karmarkar  type  of  algorithm  than  it  is  for 
the  standard  simplex  methodology.  Thus  it  may  be  appropriate  to  begin  by  converting 
any  LP  problem  into  a  canonical  form,  from  which  the  original  equality  constraints  have 
all  been  removed,  before  applying  a  Karmarkar-like  optimization  algorithm. 
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(Of  course,  the  Karmarkar  methodology  requires  the  introduction  of  slack 
variables  so  that  all  inequalities  are  ultimately  converted  into  equalities.  However,  the 
slack  type  of  inequalities  are  computationally  less  troublesome,  because  each  one  is 
directly  related  to  an  inequality  constraint- -and  the  Karmarkar  solution  process  takes 
place  in  a  feasible  region  where  these  constraints  are  not  rigidly  binding.) 

Infeasibilitv  Penalty  Functions.  The  use  of  penalty  functions  that  can  be  added 
to  the  LP  objective  function,  to  encourage  moves  that  tend  to  remove  accumulated 
feasibility  errors,  is  one  possible  approach  for  minimizing  the  vulnerability  of  the 
Karmarkar  method  to  computational  inaccuracies. 

In  this  connection,  we  have  examined  two  basic  strategies.  In  one,  the  moves 
alternate  between  those  driven  solely  by  the  feasibility  consideration  and  those  driven 
solely  by  the  optimality  objective.  In  the  other,  both  considerations  are  combined  in 
each  of  the  moves.  Both  methods  seem  to  work,  but  neither  has  shown  a  clear 
superiority. 

A  Primal-Dual  Bootstrap  Method.  Experience  with  approximate  solution 
methods  for  the  Karmarkar  algorithm  has  shown  that,  in  many  cases,  the  dual  variables 
(that  are  a  by-product  of  the  solution  of  the  steepest  ascent  sub-problem)  seem  to 
converge  much  more  rapidly  toward  a  solution  than  do  the  primal  variables.  This 
suggested  an  approach  in  which  both  the  primal  and  dual  representations  of  a  problem 
are  simultaneously  solved,  so  that  the  dual  variables  that  are  calculated  for  each 
representation  can  be  used  (in  what  we  have  labeled  a  "bootstrap"  move)  to  update  the 
primal  variables  for  the  other  representation.  This  approach  has  proved  to  be 
spectacularly  successful  in  many  cases.  It  not  only  moves  rapidly  toward  optimality,  but 
also  corrects  most  of  the  accumulated  feasibility  errors.  But  in  other  cases,  the  method 
totally  fails  to  converge. 

We  are  trying  to  understand  the  reasons  for  the  unpredictability  of  the  method, 
and  to  develop  indicators  that  can  be  used  to  determine  when  a  "bootstrap"  move  is 
likely  to  be  superior  to  a  Karmarkar  move. 

Assessment  of  Alternatives.  The  alternatives  noted  above  for  maintaining  exact 
or  approximate  feasibility  within  the  Karmarkar  algorithm  will  ultimately  have  to  be 
evaluated  in  competition  with  techniques  such  as  the  Danskin  reduction  of  deficits 
method — techniques  which  do  not  require  feasibility  in  the  computation  phase,  and 
which  only  approach  feasibility  in  the  final  stages  of  the  solution  process. 

Based  on  our  work  to  date,  we  feel  the  advance  solution  of  equality  constraints 
is  the  most  promising  we  have  tried.  The  use  of  a  feasibility  penalty  function  remains 
an  interesting  option,  while  the  boot-strap  method  appears  to  be  fundamentally  flawed. 
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FUTURE  RESEARCH  OBJECTIVES 


DSA  Report  No.  707 
page  -  13 


This  section  summarizes  research  objectives  for  the  immediate  future. 


•  Convergence  Proof  for  Simplified  Distance  Metric.  The  analysis  in  Ref.  3 
showed  that  the  Karmarkar  method  is  equivalent  to  a  simple  steepest 
descent  algorithm  in  which  distance  is  measured  in  a  logarithmic  distance 
metric  defined  over  the  original  variables.  The  analysis  suggests  that 
Karmarkar’s  convergence  proof  should  be  directly  applicable  to  an 
alternative  algorithm  which  uses  the  Karmarkar  surrogate  objective 
function  in  combination  with  the  logarithmic  distance  metric — equivalent 
to  what  would  be  obtained  by  omitting  the  Karmarkar  denominator.  That 
this  is  so  remains  to  be  proved. 

•  Convergence  Proof  Without  Surrogate  Function.  Computational 
experience  has  shown  that  remarkably  good  performance,  comparable  to 
the  Karmarkar  algorithm  on  most  of  our  test  problems,  can  be  obtained 
with  a  much  simpler  algorithm  which  ignores  the  surrogate  objective  and 
uses  a  simpler  distance  metric-equivalent  to  omitting  the  denominator  in 
the  Karmarkar  transformation.  However,  we  have  as  yet  no  formal  proof 
of  convergence  for  this  process. 

•  Performance  Test  of  New  Danskin  Method.  The  new  Danskin  method  is 
still  being  debugged.  As  it  is  being  programmed  and  tested,  important 
improvements  in  the  method  are  being  made.  When  these  are  complete, 
the  method  will  need  to  be  tested  against  alternatives  already  available. 

•  Analysis  of  "Bootstrap"  Move.  Theoretical  analysis  is  needed  to  assess  the 
circumstances  under  which  the  "bootstrap"  move  can  be  expected  to  be 
successful,  and  to  develop  indicators  that  can  be  used  during  computation 
to  decide  when  such  a  move  is  appropriate. 

•  Improved  Quadratic  Minimization.  Additional  work  is  badly  needed  on 
the  quadratic  minimization  sub-problem.  We  believe  that  this  aspect  of 
the  problem  is  crucial,  particularly  for  very  large  problems. 

•  Quadratic  Minimization  Methods  for  Other  Problems.  The  iterative 
quadratic  minimization  methods  that  are  being  developed  and  tested  for 
use  within  the  Karmarkar  algorithm  have  a  wide  variety  of  potential 
applications  to  other  practical  problems,  such  as  regression  analysis,  the 
optimization  of  electric  power  networks,  and  other  quadratic 
programming  problems.  Work  is  needed  to  assess  the  degree  to  which 
such  methods  might  be  useful  for  large-scale  problems  in  these  areas. 

•  Application  to  Non-Linear  Programming.  The  Karmarkar  concept  has 
many  features  that  are  particularly  attractive  for  non-linear  programming 
applications.  Many  of  these  applications--in  game  theory,  economics,  and 
optimal  control--now  use  either  linear  programming  or  Lagrange 
multiplier  optimization  methods  to  soive  an  optimization  sub-problem  that 
is  later  adjusted  to  match  non-linear  external  conditions  through  an 
external  iterative  loop.  In  such  applications,  the  linear  programming 
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algorithm  appears  as  a  sub-routine  that  is  called  repeatedly  during  the 
course  of  the  overall  optimization,  i.e.  "inside  a  DO  loop." 

In  such  a  system,  each  successive  sub-problem  will  typically  differ  only 
slightly  from  the  previous  one.  But  the  standard  simplex  methodology 
benefits  surprisingly  little  from  the  availability  of  an  approximately 
correct  existing  solution.  For  this  reason,  such  systems  are  sometimes 
designed  to  use  iterative  Lagrange  multiplier  methods  rather  than  the 
simplex  methodology.  But  because  of  the  iterative  nature  of  the 
Karmarkar  approach,  it  should  benefit  from  the  availability  of  an 
approximate  solution  in  much  the  same  way  as  the  Lagrange  multiplier 
methods.  Thus,  if  it  provides  a  reliable  method  of  solution,  it  might 
reasonably  be  expected  to  displace  the  simplex  method  in  many  such 
applications. 
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ENCLOSURE  A 

A  GEOMETRIC  ALGORITHM  FOR  APPROXIMATE 
STEEPEST  ASCENT  WITH  LINEAR  CONSTRAINTS 


BY  JOHN  M.  DANSKIN 


I 

I 


ABSTRACT 


The  paper  presents  a  new  method  for  approximating 
to  steepest  ascent  constrained  by  equalities.  The  time  for 
convergence  to  accuracy  g  is  for  a  given  problem  proportional 
to  log  (l/e)  .  but  the  proportionality  constant  in  the  estimate 
given  can  be  very  large.  The  method  also  applies  to  the  problem 
of  minimization  of  quadratic  forms  given  as  sums  of  squares  of 
linear  forms  in  all  of  Euclidean  space  without  boundaries. 
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5  l.  Introduction 


C  onside  r  the  p  roblem .  stated  for  3 .  {> ,  v 


Maximize  3.V  (l.U 


subject  to 


Denote  the  maximum  in  (1.  l)-(l.  2)  by  v  .  We  will 
replace  this  problem  by  an  e -problem.  in  which  we  seek,  for 
a  given  positive  z  .  either  to  be  told  that  v<  z  or  else  to  be 
provided  with  a  y  satisfying 

3  t  y  >  (l  -  e)v  (1.  3) 


and 


Iy  1  =  1  •  I  i  j  •  Y  |  <  e  •  j  =  l . q.  (1-4) 

We  present  here  an  algorithm,  geometric  in  nature.  for  producing 
a  solution  to  the  e -problem.  It  has  the  disadvantage  of  not  being 
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exact,  but  the  advantage  that  it  will  work  on  any  matrix,  and 
without  affecting  its  sparsity. 

We  have  a  convergence  proof,  but  no  speed  estimate 
in  terms  of  p,  q.  and  e  other  than  that,  for  a  given  problem, 
the  time  required  to  achieve  accuracy  e  is  proportional  to 
log  (l/e)  .  We  have  tried  crude  predecessors  of  this  method 

on  small  problems  with  encouraging  results,  and  are  certain 
that  this  will  be  much  faster.  We  intend,  before  this  paper  goes 
to  press,  to  have  tested  the  method  on  large  problems.  Our 
particular  application  is  to  a  much-modified  version  of  the 
Karmarkar  linear  programming  algorithm,  in  which  we  can 
do  with  rather  large  e  . 

We  explain  the  idea  behind  the  algorithm  in  §2  .  It 
comes  down  to  finding  the  closest  point  to  the  origin  on  a  linear 
manifold.  In  §3  we  describe  the  algorithm  itself.  In  §4  we 
analyze  the  algorithm  and  prove  that  it  converges.  In  §  5  we 
note  an  application  to  the  minimization  of  a  class  of  quadratic 


forms . 


3 


§  2  .  The  idea  of  the  algorithm 


We  may  suppose  that  |a  |  =  1  .  If  all  of  the  b.  are 
orthogonal  to  a  .  then  obviously  y  =  2  solves  the  problem.  If 
some  of  the  bj  are  orthogonal  to  3  and  some  are  not  .  we 
alte  r  those  which  are  by  adding  a  vector  5a  .  whe  re  6  €  (0,  e  /2); 
this  will  not  change  any  b  Y  by  as  much  as  c/2  .  So  from  now 
on  we  assume  that  no  b  ■  is  orthogonal  to  a.  Put 


J  = 


=  a 


J  J 


whe  re  3  .  =  l/(  3  .  b  .) 
-  J  J 


(2.  1) 


Then  a*xJ  =  l  for  all  j.  Now  denote  by  £  the  linear  manifold 
of  combinations 


x 


X.xJ 

J 


whe  re 


X  .  =  l 


_  it 


(2.  2) 


Then  z  »x  =  l  for  all  x  $  £  .  Evidently  s£  does  not  pass  through 
the  origin. 


Suppose  we  know  the  point  c  on  closest  to  the  origin. 

Put 


h 


3  -  c  /c 


2 


(2.  3) 


At 


A 


4 


If  h  =  0.  then,  for  any  y  satisfying  (1.2). 


3  •  y  =  (l/c2)c.y  =  (l/c2)  2,  Xjx^*V  =  ^  Cj(i>j*Y)  '  0 


where  the  \?  are  the  coordinates  of  c.  Hence  in  that  case  v  =0. 

J 

Otherwise  put 


=  |  h  I  .  y  =  (l/v) h  . 


Since  c  is  no rmal  to  any  line  in  £  .  then  c  •  (x  -  c )  =  0  for  any 

2  i  2  . 

x  6  sd  .  Hence  c  .x  =  c  on  £  ;  in  particular  c  «x  =  c  for  any  j. 

Using  this,  we  see  that 


h.xj  =  3.xj  -  (l/c2)  c.  xj  =  l  -  l  =  0 


for  all  j.  It  follows  that  y  satisfies  the  constraints  (1.2).  Next 
suppose  that  v'  satisfies  (1.2),  Then  c.v'  =  0  .  so  that 


3  •  Y 1  =  (a-c/c  )  «Y '  =  vv.y' 


(2.  7) 


This  implies  that  y  yields  the  unique  maximum  in  the  problem 
(l.  1) - (l.  2).  and  that  that  maximum  is  v  . 


The  matter  thus  comes  down  to  finding  the  closest  point 
c,  or  a  good  approximation  to  it.  That  is  what  our  algorithm  does. 


i 
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§  3  .  The  algorithm 


We  suppose  the  routine  has  arrived  at  a  X  and 
corresponding  x  satisfying  (2.2)  .  It  first  tests  to  see 
whether  we  have  sufficient  evidence  to  assure  that  v<  e. 

It  does  this  by  calculating 

h  =  2  -x/x2  (3. 1) 

with  the  x  at  hand  rather  than  with  c  as  at  (2.  3)  .  If 
| h |  <  e  .  then  v<  e  and  the  routine  terminates.  If 
|  h  |  >  s  ,  it  forms 


y  =  h/|h| 


(3.  2) 


and  tests  the  inequalities  in  (1.4)  ,  with  e  replaced  by  c/2  if 
the  Zk  has  been  altered  as  at  the  beginning  of  8  2  to  assure 
nonorthogonaiity  with  o.  .  If  all  of  those  tests  pass,  it  terminates. 
Otherwise  it  prepares  for  a  move. 


That  it  is  not  necessary  to  test  (1.  3)  ,  and  that  the 
above  |h|  <  test  is  correct,  is  proved  at  Lemma  D  in  §4  . 


The  square  of  the  distance  from  x  to  the  origin  is 


P  _9 

QU)  =  l  <  l  A.*/  )2  . 

i=l  j=l 


The  partials  with  respect  to  the  Xj 


are 


(3.  3) 


Qj(  X  )  =  2x  «x^  .  j  =  l . q  • 


(3.  4) 


The  routine  averages  the  Q  (X),  reverses  the  signs,  and 
forms  the  direction  numbers 


2x  *(x  -  xJ)  ,  j  =  1 . q  ,  (3.  5) 

in  which  x  is  the  barycenter  of  the  xJ  .  It  then  calculates 

q 

0  =  2[  ^  (  x  .  (  x  -  x^)  )2  ] .  (3,6) 

j  =  l 

Because  some  of  the  tests  have  failed,  p  >  0  .  It  now  forms 
the  direction  cosines 


O-  -  (l/o  )  2x  •  (  x  -  x^)  ,  j  =  l . q 


(3.  7) 


in  the  X -space,  moves  to  the  bottom  of  the  parabola  in  that 
direction,  and  proceeds. 


T 
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§ 4  .  Analysis  of  the  algorithm 


The  first  task  of  this  analysis  is  to  examine  the 
meaning  of  the  quantity  o  in  (3.  7),  which  is  the  steepest 
rate  of  descent  relative  to  the  \  ,  as  it  relates  to  the 
geometry  of  the  x-space. 

We  will  show  that  a  large  p  guarantees  a  good  move, 
and  a  small  o  indicates  that  we  are  near  the  solution.  These 
facts.  proved  as  Lemmas  B  and  C.  immediately  yield  the 
convergence  theorem.  Lemma  C  may  fairly  be  regarded  as 
difficult. 


The  second  task.  very  much  easier,  is  to  establish 
the  facts  about  the  tests  involving  the  value  which  we  used  in 
the  algorithm.  This  we  do  in  Lemma  D,  after  quoting  a 


principle  due  to  Lemke. 
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Put 


X 


j 


j  =  1 . q 


(4.  1) 


and  then 


=  [  ^  (x'J)  1 


2  ,1/2 


(4.  2) 


We  suppose  that  A  >  0  ;  otherwise  the  problem  is  trivial. 


LEMMA  A.  Denote  by  ds  and  dcr  the  lengths  of  the  same 
infinitesimal  segment  on  £  ,  measured  relative  to  the  x-space  and 
A  -  space  respectively.  Then 


ds 

da 


<  A  . 


(4.  3) 


PROOF.  Fix  on  any  and  corresponding  x^  .  Consider 
an  admissible  direction  issuing  in  ^  from  .  i.  e.  a  unit  vector 
g  with 


Move  a  distance  <7  >  0  from  A  along  g  .  The  corresponding 


distance  relative  to  x  is 

\P  q 

s(a)  =  o[\  ( y  ^jxi^2 •  (-1. 5) 

i=l  j=l 

Using  (4.4),  we  may  replace  the  term  xj?  in  (4.  5)  by  • 

Call  the  result  (4.  5)'  .  From  the  Schwarz  inequality, 

q  vq 

[  ^  <  2^  ( JL) £ )  2  .  i  =  1 . P  .  (4.  6) 

j=l  J=1 

On  putting  this  into  (4.  5)'  and  reversing  the  order  of  summation, 
we  get 


s(a)  <  t  cr  . 


(4.  7) 


which  proves  (4  .3)  .  z 

Now  we  will  prove  that  a  large  p  guarantees  a  good  move. 

LEMMA  B.  If  3  >  0  ,  a  move  in  the  A -space  direction 

2  2 

g  given  by  (3,  5)  will  yield  a  decrease  of  at  least  0  /4&  in  the 
squared  distance  to  the  origin. 
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PROOF.  The  directional  derivative  of  Q  in  the  direction 
g.  relative  to  \  . 


dQ 
d  a 


(4.  8) 


We  may  rewrite  this  as 


dQ  ds 
ds  *  d  (7 


(4.9) 


By  Lemma  A,  i.  e.  (4.  3).  this  implies  that  the  directional 
derivative  of  Q  in  the  x-direction  corresponding  to  g  satisfies 


dQ 

ds 


< 


o  /  i 


(4. 10) 


Supposing  that  dQ/ds  in  that  direction  at  x^  is  exactly  -  '  .  we 
see  that  the  formula  for  the  squared  distance  along  it  is 


Q  =  Q  -is  +  s2 

3  0  * 


(4.  ID 


s  being  the  distance  in  relative  to  x  .  Here  the  reader 
should  recall  that  the  second  directional  derivative  of  the 
squared-distance  function  Q  ,  relative  to  x,  in  any  direction, 
is  2.  The  minimum  of  Q  is  attained  at  s  =  l  /2  .  and  its 
value  is  QQ  -  ;2/4  .  Since  ;>  p/£  .  the  lemma  is  proved. 
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We  note  an  example  where  this  estimate  is  exact. 

I  2 

Suppose  p  =  q  =  2,  x  =(1.1)  ,  x  =  (3,1),  so  that  ^  isthe 
line  y  =  l  .  Since  x  =  (2.  1),  we  get  <1  =  JZ  .  The  direction 
numbers  (3.  3)  are 

2(x.  1)  •  ((2,1)  -  (l,  1)  )  =  2(x,  1)  .  (1.  0)  =  2x  (4.  12) 

—  2  2  2 
and  -2x  .  So  p  =  2,/2  x  ,  and  we  get  o  /4a  =  x  . 

The  reader  will  in  fact  trivially  verify  that  the  estimate 

2  2 

2  /4a  is  exact  whenever  j£  is  a  line.  However  that  is  the  only 
such  case.  Note  that  (3.  6)  may  be  rewritten  as 

c  =  2[  l  (  (x-c)  •  xj  )2  ll/2  .  (4.13) 

By  the  Schwarz  inequality  . 

c  <  2  |x  «  c  |  a  •  (4. 14) 

equality  holding  only  if  x-c  is  parallel  to  jr1  a  x  -  x^  for  every 

j  ,  i.  e,  ^  is  a  line.  Othe  rwise  o<2|r-clA-  i*e. 

2  2  2 

p  /4a  <  (x-c)  ,  and  the  estimate  is  short. 

Next  we  show  that  a  small  2  implies  that  the  solution  is 


nearby.  We  first  need  some  notation. 
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Identify  a  maximal  linearly  independent  collection 

of  the  iji  ,  and  denote  it  by  {  ju^}  ^  ^  •  Denote 

T 

the  corresponding  Kx  p  matrix  by  x  ,  and  put  Q  =  u  x 
The  K  y  K  matrix  fl  is  then  positive  definite.  Denote  its 
smallest  eigenvalue  by  a 

LEMMA  C.  The  distance  to  the  solution  satisfies 


|x  -  c  |  <  pi /2a  . 


(4. 15) 


PROOF.  Suppose  that  x  and  c  correspond  to  X  and  X 
Then  we  may  write 


J 


X  -  c  =  u  •  X 

J 

j  =  l 


where  .  .  =  X.  -  X-  .  We  could  do  this  because  the  j.  sum  to 
J  J  J  J 

zero.  Now  consider  a  j  for  which  does  not  lie  in  the  basis 
k  .  't'i _  j 


(4. 16) 


set  {  j;l'l  _  ^  p,  .  Then  xJ  is  some  linear  combination 

k 

of  the  x  .  which  we  may  substitute  into  (4.  16).  After  doing 
this  for  each  such  j.  we  have  a  representation 

K 

\  k  k 
x  -  c  =  /  v  x 

k  =  1 

in  which  the  coefficients  no  longer  add  to  zero;  we  have  not 


(4. 17) 
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altered  x  .  By  the  choice  of  the  jj  .  this  representation  is 
unique. 

Put 

9k  =  2(x-  c)  •  juk  .  k  =  l . K  .  (4.18) 

Then.  from  (4.17), 

K 

ak  =  2  1  vk.'JJk,uk'  •  k  =  l - -  K  .  (4.  19) 

k'  =  1 

Now  this  may  be  rewritten  as 

3  =  2flv  ,  (4.20) 

•  .  [ 

where  9  =  (3  ^ . 3  p.)  an^  v  =  (v^.  •  .  .  .v^)  •  Since  f. 

has  the  largest  eigenvalue  a'*’  -  this  implies  that 

|v  |  ^  a-1  |9  i  n  -  (4.2D 

Now  recall  the  alternate  representation  (4.13)  for  p  .  It  follows 
from  that  that  |g  |  <  p  .  Hence  (4.  21)  implies  that 


)v  |  <  p  /2a  . 


(4.  22) 
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Now  from  (4.17).  the  Schwarz  inequality.  and  the  definition 
(4.  2),  we  get 


|  x  -  c  |  <  |  v  |  A  •  (4.2  3) 

The  assertion  (4.15)  of  the  lemma  now  follows  from  (4.22)  and 
(4.23)  .  2 


We  are  now  ready  to  prove  our  convergence  theorem. 


CONVERGENCE  THEOREM,  E?ch  move  with  x  ^  c 
produces  a  new  point  x'  with 


x'  -  c 


-cc2/214 


IX  -  c 


(4.  24) 


whe re  a  is  the  smallest  eigenvalue  and  A  •  whe re  A  is  given 
by  (4.  2),  the  trace  of  the  matrix  Q  . 


PROOF.  By  (4.15)  and  Lemma  B,  the  move  decreases 
the  squared  distance  by  at  least 


.2,  ,2 
4i  (x  -  c) 

2  X 
A 


(4.  25) 


□ 


(4.24)  follows  immediately  from  this  and  the  inequality  l  -u  <  e 
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That  the  process  converges  is  obvious  from  this 
theorem.  But  it  gives  no  useful  information  on  the  speed 
of  convergence.  That  smallest  eigenvalue  a,  could  indeed 
be  much  smaller  than  the  trace.  On  the  other  hand,  quite  a 
few  extreme  estimates  went  into  the  derivation  of  (4.  24).  How 
fast  the  algorithm  converges  must  be  tested  by  experience. 

We  now  turn  to  the  second  task  of  this  section.  the 
analysis  of  the  tests  involving  the  value.  We  have  first  to 
quote  a  variant  of  a  principle  due  to  C.  E.  Lemke  . 

LEMKE'S  CLOSEST-POINT  PRINCIPLE  FOR  EQUALITY 

CONSTRAINTS  [Lemke  [l],  196 1  ]  .  Consider  the  set  B  oi  all 

V 

linear  combinations  b  =  '  u  .  b  .  of  the  vectors  C  .  .  There 

- •  j  j  -  j  - 

are  two  possibilities.  It  may  be  that  a  f  B  ,  in  which  case 
2  •  Y  -  0  for  all  y  satisfying  (1,  2)  ,  Or  else  :  |  B  ,  In  this  case 
the  re  is  a  point  b  ( 2 )  of_ B  closest  to  2  .  Put 

v  =  | -2  -  b  (a, )  j  .  v  =  (  l/v)  (  2  -  b  ( 2 ))  .  (4.  2  6) 

Then  y  solves  the  problem  <1,  !)-(!.  2).  and  the  value  is  v  ;:)  . 

*)  Lemke's  formulation  had  dj  *y^.O  instead  of  the  equalities 
we  have  at  (1.2)  .  His  principle  then  requires  the  Uj  to  be  nonnegative-, 
everything  else  is  the  same. 
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The  proof  of  this  principle  is  trivial;  one  simply 
writes  out  the  formula  for  the  squared  distance  in  terms  of 
u  and  differentiates.  But  it  does  not  seem  to  have  been 
noticed  before  Le  mke. 

LEMMA  D.  For  the  y  of  (3.2)  ,  always 

3*  y  =  I h |  >  V  .  (4.  27) 

PROOF.  As  to  the  equality,  first  observe  that 

x  •  h  =  j  -  x2/x2  =  1-1=0.  (4.  28) 

a  calculation  similar  to  (2.  6)  .  Hence 

2  •  v  =  (1/ I  h  | )  a.h  =  (1  /  |  h  | )  (  ^  -  x/x2)  .  h  =  (l  /  |  h  I )  h2  =  |  h  |  .  (4.  2  9) 

As  to  the  inequality,  it  is  evident  that  the  linear  manifold  *f  is 
a  (proper)  subset  of  Lemke's  cone  B,  and  so  |h|  a  distance 
|  3  -  b  I  with  b  $  B  .  The  inequality  thus  follows  from  Lemke's 
principle.  “ 

So  (l.  3)  in  fact  holds  always  without  the  s;  .  and  the 
test  | h |  <  e  in  the  algorithm  is  correct. 
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§  5.  On  minimization  in  a  class  of  quadratic  forma 


The  idea  of  this  section  derives  from  Lemke's  principle, 
which  we  now  write  out  somewhat  more  explicitly.  For  a  g  R^ 
put 


=  .( 


u 


b..  u- 
Ji  J 


i  =  1 . p 


(5.  1) 


and  then 


3  ^ )  =  3  (u  )  •  3(u  )  •  (5.  2) 

the  dot  product  being  in  R^  .  The  principle  says  that  the  problem 
of  steepest  ascent  is  equivalent  to  the  problem  of  minimizing  the 
quadratic  form  2( «. )  . 

Now  we  may  consider  the  transformation  a  -*  3  of  (5.1) 
as  mapping  the  "base  space"  of  the  "Lagrange  multiplie  rs "  onto 
the  cone  B  in  the  "image  space"  R*3  of  the  x.  The  difficulty  in 
trying  to  work  in  the  base  space  is  that  there  is  no  intrinsic 
connections  between  the  local  (metric)  geometries.  Steepest 
ascents,  or  even  good  ascents,  do  not  correspond.  For  instance. 
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in  the  original  steepest-ascent  problem  (l.  1)-(1.  2),  multiplying 
one  of  the  b j  's  through  by  .  3ay.  2  .  does  not  change  the 
problem  at  all,  but  it  considerably  affects  the  geometry  of  the 
base  space.  It  was  only  after  considerable  efforts  to  make 
schemes  based  on  the  base  space  work  that  the  author  came  to 
the  idea  of  the  linear  submanifold  ^  of  B,  as  a  way  of  getting 
at  "steepest  feasible  ascent"  in  the  image  space  directly.  There 
are  intrinsic  relationships  between  the  natural  metric  x-geometry 
and  the  metric  X-geometry  of  the  parameters,  as  we  showed 
above  in  Lemmas  A-C.  That  is  why  our  algorithm  should  work 
well  for  the  steepest-ascent  problem. 

And  now  we  come  to  out  point.  For  the  same  reasons, 
it  should  work  well  on  the  problem  of  minimizing  any  quadratic 
form  given  as  the  sum  of  squares  of  linear  forms,  as  is  the  3(  -  ) 
of  (5.  l)-(5.  2)  ,  in  all  of  R1^  without  boundaries  ,  regardless  of 
provenance.  All  one  has  to  do  is  treat  it  as  a  steepest-ascent 
problem,  apply  our  algorithm  to  obtain  a  sufficiently  good  set 

[Xj}  ,  and  then  put  Qj  =  g  Xj  .  j  =  l . q  .  the  tL  being 

given  by  (2.  1). 


So  we  have  turned  Lemke  's  principle  around. 
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ENCLOSURE  B 

A  VARIANT  OF  KARMARKAR’S  LINEAR  PROGRAMMING 
ALGORITHM  NOT  REQUIRING  PROJECTION  ONTO  THE 

NULL  SPACE 


BY  JOHN  M.  DANSKIN 


ABSTRACT 


Recently,  N,  Karmarkar  proposed  to  solve  a  linear 
program  by  mapping  the  problem  projectively  into  a  simplex, 
using  further  projective  transformations  to  keep  the  current 
iteration  point  at  the  barycenter  of  the  simplex.  He  finds  the 
direction  of  steepest  descent  from  that  barycenter  by  projecting 
his  objective  vector  onto  the  feasible  (null)  space.  This  method, 
which  requires  the  inversion  of  a  matrix,  has  led  to  severe 
difficulties  of  speed  and  accuracy  in  inverting  the  matrix,  of 
finding  an  initial  point,  and  questions  as  to  whether  convergence 
has  in  fact  been  proved. 

The  present  paper  uses  Karmarkar's  simplex,  and  his 
projective  transformations,  but  is  otherwise  very  different  from 
his.  The  method,  which  involves  bringing  down  a  number  of 
"deficits"  in  a  "balanced"  way,  requires  no  projection  onto  any 
null  space,  no  matrix  inversion,  no  operations  affecting  the 
sparsity  of  the  matrix.  It  has  no  problems  with  accuracy. 

There  is  a  classical,  real-number,  proof  of  convergence, 
not  involving  bounds  based  on  the  number  of  bits  required  to  state 
the  problem.  A  FORTRAN  program  is  being  prepared  for  testing. 


We  do  not  offer  a  complexity -type  estimate  of  the 
speed  of  the  algorithm,  because  we  do  not  have  one.  We 
do  not  know  how  many  steps  the  "inside"  steepest-ascent 
algorithm  requires  to  achieve  given  accuracy,  because  the 
convergence  estimate  for  that  algorithm,  given  at  (4.24)  in 
[l],  contains  quantities  very  difficult  to  estimate.  What  we 
do  have  is  a  complete  proof  of  the  convergence  of  the  whole 
process.  The  speed  of  this  algorithm  can  only  be  tested  by 


5  l.  Introduction 


This  paper  presents  a  new  version  of  Karmarkar's 
approach  [z,  3]  to  the  solution  of  a  linear  program  by  mapping 
the  problem  into  a  simplex  and  working  thenceforth  in  the 
simplex. 


Our  method  has  none  of  the  feasibility  or  proof-of- 
convergence  problems  that  have  arisen  in  Karmarkar's  approach. 
Furthermore,  it  has  no  projection  onto  the  null  space  and  no 
matrix  inversion. 

It  works  by  setting  up  a  system  of  "deficits"  in  the 
satisfaction  of  the  conditions  for  a  solution  of  an  LP,  including 
the  feasibility  conditions.  We  introduce  new  variables, 
analogous  to  the  slack  variables  of  the  simplex  method  but  not 
really  slack  variables,  so  as  to  make  the  deficits  all  equal  and 
positive  at  the  outset.  Working  then  in  Karmarkar's  simplex, 
we  find  a  direction  in  which  the  deficits  all  go  down  by  guaranteed 
amounts,  in  such  a  way  that  the  ratio  of  the  largest  to  the  smallest 
deficit  never  exceeds  5/3  . 


We  find  that  direction  by  a  new  geometric  method 
for  finding  approximate  steepest -ascent  subject  to  equality 
constraints,  developed  [l]  for  the  present  purpose.  The 
constraints  have  to  be  met  only  very  loosely:  see  (4.9)  and 
(4.10).  The  sparseness  of  the  matrix  is  not  affected.  While 
some  precision  is  important  in  setting  up  the  linear  manifold 
s£  on  which  the  method  of  [l]  works,  there  is  no  accumulation 
ot  errors.  These  are  in  fact  self-correcting,  as  we  show  in 
5  4. 

At  the  time  of  this  writing,  the  method  has  not  been 
tested.  A  FORTRAN  program  is  in  an  advanced  state  of 
preparation,  and  should  be  working  by  July.  While  the  theory 
presented  in  this  paper  is  complete  and  ready  for  refereeing, 
it  will  not  be  submitted  for  publication  until  the  method  has  been 
tested  against  commercial  fast  simplex  methods  like  Tomlin's 
WHIZZARD,  under  proper  conditions,  and  we  have  comparisons 
to  report.  Meanwhile,  this  paper  is  to  be  regarded  as  a  private 
working  document,  for  the  author's  use  in  preparing  his  program, 
as  documentation  in  proposals,  and  for  those  of  his  colleagues 
who  wish  to  read  it. 


The  organization  is  as  follows.  §2  states  the  problem, 
defines  the  deficits,  and  states  our  objectives.  ?  3  explains  our 
approach  to  the  simplex,  in  rather  more  detail  and  with  an 
attitude  rather  different  from  Karmarkar's.  He,  for  instance, 
has  only  one  simplex,  and  one  "potential  function"  .  We  have 
infinitely  many  simplexes,  and  infinitely  many  "surrogate 
functions"  .  The  main  result  of  that  section  is  the  estimate 
(3.26)  for  the  current  deficit,  given  in  the  original  space: 
Karmarkar  has  given  no  such  estimate.  That  estimate  has  on 
its  right  hand  side  two  quantities,  an  jj  that  we  have  to  prove 
less  than  unity,  and  a  "geometric  mean"  of  the  coordinates  of 
the  current  position  point  in  the  original  space,  which  we  have 
to  prove  bounded. 

In  §4  we  show  how  to  move  so  as  to  get  an  improvement 
while  preserving  the  5/3  ratio  between  the  largest  and  smallest 
deficits,  a  central  aspect  of  our  approach.  In  §  5  we  show  that 
the  move  of  §4  yields  a  guaranteed  gain,  and  estimate  ju  at 
(5.19)  .  In  §  6  we  prove,  by  contradiction,  that  the  "geometric 
mean"  appearing  in  (3,26)  is  bounded,  and  thus,  with  the  results 
of  (I  5  on  rjj  ,  complete  the  convergence  proof.  The  final  §7 
outlines  our  routine. 


§  2.  Statement  of  the  problem 


We  consider  the  LP 


P  v 

Maximize  C  •  X  =  \  C.X.  subject  to  )  A..X.  <B..  j  =  1 

-  L  i  i  - 1 -  L  ij  i  "  ]  J 

i  =  l 


P 

i  =  1 


•  •  -  q 

(2.  n 


and  its  dual 


Minimize 


B  •  Y  =  y  B  ^  Yj  subject  to  T  A„  Y^  >  ,  i  =  1.  .  . .  ,  p 

j  =  l  j  =  l  (2.2) 


in  which  X  >  0  ,  Y  >  0  .  The  primal  variable  X  is  said  to  be 
feasible  if  it  satisfies  the  inequalities  in  (2.  1),  and  the  dual  variable 
Y  feasible  if  it  satisfies  the  inequalities  in  (2.  2).  It  is  trivial  that 
if  X  and  Y  are  both  feasible,  then  C  *X  <  B  »Y  ;  and  a  necessary 
and  sufficient  condition  that  the  pair  (X,  Y)  solve  the  problems  (2.1)- 
(2.2)  is  that  C  .X  =  B  .Y  . 


The  standard  approach  to  (2,l)-(2.  2)  is,  first,  to  find 
a  feasible  X.  This,  the  "first  phase",  requires  the  solution  of 
an  auxiliary  linear  program.  Then,  in  the  "second  phas  V, 
keeping  X  always  feasible,  one  seeks  to  increase  C«  X  .  In  the 
simplex  method,  one  arrives  eventually  at  a  "basic  optimal 
solution",  in  terms  of  a  "basis" 


.  The  test  for  optimality  is 


made  in  terms  of  this  basis;  and  the  solution  Y  to  the  dual 
problem  is  obtained  automatically  from  the  operations  needed 
to  solve  the  primal. 

Our  approach  is  wholly  ditferent.  'Ye  never  have  a 
feasible  X  or  Y.  Rather.  we  have  at  any  time  some  nonnegative 
pair  (X,  Y),  and  associated  with  that  pair  a  collection  ot  "deficits", 
amounts  by  which  the  inequalities  in  (2.1)  and  (2.2),  and  the 
inequality  C*X  >  B  •  Y  ,  are  violated.  If  none  of  them  is 
violated,  then  we  have,  by  what  was  said  above,  a  classical 
solution.  If  several  of  them  are  violated,  we  seek  to  bring  the 
violations.  the  deficits.  down.  For  us.  a  solution  is  a  pair 
(X,Y)  for  which  none  of  the  deficits  exceeds  some  prescribed 
small  positive  number.  And  we  do  not  attempt  to  provide  a 
basis . 

A  basic  difficulty  with  any  sch;me  for  reducing  deficits 
is  that  a  move  intended  to  reduce  the  deficits  at  hand  may 
introduce  new  ones.  We  get  around  this  by  arranging  things  so 
that  everything  is  always  in  deficit.  We  introduce  new 

nonnegative  variables  U.  ,  i  =  1 . p;  V.  .  j  =  1 . q;  and 

W.  and  put 


I 


*i = 


q 

)  A..Y.  +  U.  +  C. 

Z-,  ij  j  i  i 

j  =  l 


1  —  1,  .  .  .  i  p  • 


P 

> 

i  =  1 


A. .  X. 
ij  1 


+  V.  -  B. 
J  J 


j  =  i . q 


(2.  3) 


(2.4) 


A  =  -  l  SXx  +  L  BjYj  +  W  •  (2’5) 

i=l  j=l 

At  the  outset  we  choose  the  U.  ,  V.  ,  and  W  are  all  positive 

i  J 

and  equal.  We  from  then  on  move  in  such  a  way  that  always 


A; 


Aj 


[  ^ 
L  4 


5a. 

4 


(2.  6) 


That  this  can  be  accomplished,  while  driving  A  (generally)  down, 
is  nontrivial,  and  a  principal  element  of  our  method.  See  §4  . 


So  we  are  now  operating  in  the  space  of  the  (X,  Y ,  U,  V ,  W) , 
in  the  nonnegative  orthant  of  p^P  +  ^q+  ^  ,  Our  objective,  given 
a  positive  e  ,  is  to  find  aset(X,Y,U,V,W)  for  which  (2.  b)  is 
satisfied  and 


0  <  a  <  e  . 


(2.  T) 


We  are  now  ready  to  transfer  the  problem  to  the  Karmarkar 


simplex. 


I 

I 


0  ,  ,  0  , 

5.  /z  .  and  5  /z  >  where 


°(?)  = .  v 

j  =  l 


6j(?) 


A.  .  Y°y.  +  U?u.  +  C.z  ,  i  =1 _ _  p  ; 

ij  J  J  1  1  i  F 


(3.  5) 


5j(?)=  y  A..X°x.  +V-^'BjZ(j=l . q;  (3.6) 

i  =  1 


5 ° (  »)  =  -  ^  C.  X°  x.  +  )  B.  Y°  y. 

i  l  l  U  j  j  7 } 

i=l  j=l 


+  W0^  . 


(3.  7) 


For  c  strictly  interior  to  £  ,  we  put 


¥?  (  ? )  =  6  ?  (  ? )  /  ? )  ,  i  =  1 . p  ;  -»?( | )  =  5  ?  (  § )  /g(? )  .  j  =  1 . q 


(3.  8) 


sV)  =  6°(?)/^(»)  . 


where  g(%)  is  the  geometric  mean  function 


^(?) 


v  l/(m  + 1) 
’?m+I) 


(3.  9) 


The  functions  rj?  ,  *0?  .  and  -jP  are  then  homogeneous  of  degree 

zero.  Karmarkar  calls  them  (their  logarithms)  "potential  functions"  . 
I  prefer  to  call  them  "surrogate  functions"  . 


(X,  Y,  U,  V,  W)  .  Choose 


Pat  m  =  2p  +2q  +1  and  write  H= 

a  f  Rm  with  hP  >  0  ,  k  =  1 . m,  and  so  that  all  the 

~  '  k  1  J 

and  A  of  (2.  3)-(2.  5)  are  equal.  For  any  E  6  F™  put 

m 

=  rn  + 1  =  2  =  1/(1  2  I  VHk  1  ’  (3'1) 

k  =  1 

and  then 


k  —  It . .  •  f  nc  . 


(3.2) 


These  formulas  define  a  transformation  .  called  the  Karmarkar 
transformation  ,  of  R™  into  the  unit  simplex 


I  :  +  *  ,  .  =  1  ,  s,  >  0  ,  k  =  1 . m+1  (3. 3) 

-1  m+1  k  — 

in  Rn‘  *  .  Its  inverse,  defined  for  +  ^  =  z  >  0  ,  is 


(T°fl 


k  ~k  '  3  m  + l 


k  =  l. 


m 


(3.  4) 


functions 


We  will  use  the  obvious  notation  §  = 
A^,  A-  «  and  A  transform  unde r  T® 


(x,  y,  u.  'J.W,  z)  • 
into  the  functions 


The 


5 ?/z  . 


The  transformation  T°  has  mapped  the  starting  point 
H°  <=  into  the  barycenter  §  of  Z  ,  all  of  whose  components 

are  l/(m  +  l)  .  Moves,  in  Karmarkar's  scheme,  are  made  as 

4 

follows.  One  distinguishes  some  *  interior  to  £  ,  in  some  sense 
better  than  »  .  One  then  applies  the  transformation 


*k/gk 

Xv?! 


,  k  =  1 . m  +  1  , 


(3.  10) 


which  maps  ?  into  ?  .  Karmarkar  thinks  of  one  simplex  £  , 


with  one  potential  function  (really  an  equivalence  class)  defined 


on  it.  I  prefer  to  think  of  replicas  of  I  ,  so  that  the  original 
simplex,  into  which  T^  maps,  is  •  and  the  first  of  the 
transformations  t^  maps  iP  onto  I*  .  Put  E*  =  (T®)  )  . 

1  0  4^ 

Then  the  product  transformation  T  =  T  t  ,  mapping  R  x 
into  ,  is  given  by  the  formulas  (3.  1)  —  ( 3.  2)  with  E^  replaced 
by  E*  ,  and  it  maps  E^  into  the  barycenter  of  E  .  In  general 


suppose  that  N>  1  and  that  we  have  defined  a  transformation 

T^  "  ^  mapping  R1^1  into  a  replica  Z^  *  of  Z  ,  as  a  mapping 

of  the  type  (3.  l)-(3.  2)  with  replaced  by  a  E^  *  6  Rm  having 

4  . 

all  its  components  positive.  Suppose  a  distinguished  5  interior 

N  -  1  #  N  -  1  . 

to  E  is  at  hand,  and  that  the  inverse  of  *  under  T  is 

S^1 .  Then  E^  has  all  its  components  positive.  Now  use  (3.10) 

N  -  1  N 

to  define  a  mapping  of  Z  onto  a  further  replica  Z  of  [  , 

N  N  .  1  #  N  m  N 

and  then  put  T  =  T  1 1  .  T  then  maps  R  +  into  Z  ,  and 


0  N 

it  is  given  by  (3.  1) -(3.  2)  with  S  replaced  by  ■=  .  This 

N 

completes  the  inductive  definition  of  the  T  ,  except  for  the 

# 

precise  specification  of  the  distinguished  point  §  ,  which 

we  shall  give  below  in  §  7. 


We  will  need  an  identity,  based  on  the  above  definitions. 

N  #  n  -  1 

We  defined  H  above  to  be  the  inverse  of  ^  under  T  .  Since 

N  -  1 

the  inverse  of  T  is  gotten  from  (3.4)  by  replacing  the  superscript 
0  at  both  of  its  appearances  by  N  -  1  ,  this  means  that 


„N  J  N-l  .  # 

=  s.  * 

—  I  ^ 


'k  ~k 


m  + 1 


k  —  1 , . . .  p  m  . 


(3.11) 


N-l  N 

Now  suppose  *  £  E  and  define  £  £  by  (3.10)  .  On 


w 


riting  a  =  ^  ?k^'<  '  ve  get 


- 

"k  ’k 


(^k'1/?m  +  l)X(?k/CT?k) 


_N-1_  / 

~"k  ’  k  CT  ~  m  -*T 


k  =  1,  .  .  .  ,  m  . 


(3.  12) 


Also  evidently 


£7(  ?  =  S7<  ?>/ . 


(3.  13) 


It  follows  that 


This  identity,  which  Karmarkar  never  stated  explicitly  because 
he  had  only  one  potential  function  and  one  simplex  (the  ratio  on 
the  right  appears  as  the  rather  vaguely  explained  additive  term 


in  his  definition  of  the  potential  function,  additive  because 

he  works  with  the  logarithm)  ,  is  the  fundamental  identity 

of  his  whole  construction,  and  the  reason  why  it  works.  It 

N 

also  explains  why  he  needs  the  surrogate  function  cp  and 

,  •  CN 

not  some  other  function,  say  6 


Observe  the  following  consequence.  Suppose  that 
?  ,  n  e  '  1  ,  5  1  =  ,  n'  =  t#T|  .  Then 


N 

4> 


(V) 


N-  1,-v 

v  (?) 

N-l." 

¥  in) 


(3.17) 


This  is  Karmarkar's  "invariant  projective  c ross -ratio",  to 
which  he  alludes  but  never  makes  explicit.  It  means  that 
one  can  track  the  progress  of  the  algorithm  in  any  of  the 

replicas  of  T,  ,  in  particular  ,  which  we  are  about  to 

N  N 

do.  VVe  note  that  (3.  17)  holds  lor  the  and  cc j  as  well. 


In  what  follows  we  suppose  that  at  each  stage  N=0.  1, .  .  . 

#  N 

we  are  able  to  find  a  distinguished  5  6  1  such  that 


N 

V 


)  <  juV  (?) 


(3.18) 


oj  c  (0,1)  being  independent  of  N  .  Using  this  and  the  apparatus 

_N 

developed  above,  we  find  at  (3.26)  an  estimate  forA(-  )  •  The 


proof  is  close  to  Karmarkar's  (incomplete)  proof  of  conve rgence 
for  the  case  when  the  value  is  zero  in  his  early  manuscript  [2]  . 


We  said  we  would  track  the  progress  of  the  algorithm 


Q 

in  E  .  So  we  put 


»N  =  T°hN  ,  N  =  0,1,... 


(3. 19) 


The  are  now  in  E^  »  with  ^  =  » 


Now  (3.18)  and  the  invariant  cross-ratio  principle 


(3. 17)  imply  that 


.  N-l,  ,  #  .-1  *.  ^  N-l,.#  i-lr, 

'-P  '  )  ?  )  <  JU  3>  (t  ?  )  > 


(3.  20) 


#  # 

where  t^  ^  denotes  the  mapping  t  of  (3.10)  when  it  is  from 

N-l  N  N 

E  onto  E  .  That  is,  the  inequality  (3.18)  on  £  implies 

N-l 

the  same  inequality  on  E  .  but  applied  to  the  predecessors. 

Walking  this  back  to  E^  and  recalling  the  definition  of  *  as 

#  N 

the  inverse  of  Z  under  T  ,  we  get 


0,  N  +  l  ^  0,_N 

V  ( §  )  <  JU  £  ( £  )  . 


(3.21) 


Thus  we  have  tracked  the  action  back  to  E^  . 


Now  concatenate 


the  inequalities  (3.21),  with  N  replaced  by  0,1,...,  N-l.  We 


N  0/ff0\  Tsi  —  ni 
cp  ( §  )  <  uu  ;p  ( ?  )  >  N  -  °,  *■» 


(3.  22) 


We  now  have  to  determine  what  this  means  relative  to  the  original 
problem  in  F?1  :  this  is  what  Karmarkar  did  not  do  in  [2]  . 


We  recall  that  ^  =  §  .  From  (3.7)  and  (3.8)  , 


°°(?>  =  -I  c,*0,  +  I  Vj 


+  w°  , 


(3.23) 


which  is  equal  to  the  original  deficit  a(-  )  .  From  the 

0  N  ,  N  . 

definitions  of  6  ,  H  and  »  ,  we  have 


.  N,  0,  Nw  N 

A(  =  )  =  6  ( l  )/z 


(3.  24) 


in  which  z  is  the  last  component  of  -  .  From  this  and  (3.23), 

we  see  that  (3.22)  translates  into 


,_N.  N  ,0.  ,  N.  .  N 

A(-)<  uu  xA(-)  x  •  )  'z 


:3.  25) 


We  calculate  the  last  factor  in  (3.25)  by  substituting  (3.2)  into 
the  definition  (3.  9).  This  gives  us 


•  •  • 


(3.26) 


,_N, 

A(  H  )  < 


^NA(H°)[Ef 


_  N  .  l/(m  +  1) 
*  "m 


[H? 


-0  -I  l/(m  + 1) 


This  is  our  desired  estimate  of  the  deficit  at  stage  N  . 


In  order  to  pass  from  (3.26)  to  a  proof  of  convergence, 
we  have  to  solve  two  problems.  The  first  is  that  of  finding  the 
distinguished  5  in  (3.18)  ,  while  respecting  the  condition 


N  N 
6i  •  6j 


€  [  16^  S^j 


(3.  27) 


corresponding  to  (2.6)  .  That  we  do  in  §4  below.  That  that 
moves  yields  a  guaranteed  jj  £  (0,2)  is  proved  in  §5  :  see 
(5.19)  .  The  denominator  in  (3.26)  gives  us  no  trouble; 
we  could  have  chosen  all  the  initial  to  exceed  unity.  So 
the  second,  and  final,  problem  is  that  of  finding  conditions 
under  which  the  "geometric  mean"  term  in  the  numerator  is 
bounded.  We  prove  in  §6  that  this  is  so  provided  the  set  of 
solutions  of  the  original  LP  problem  (2.  1 )  -  ( 2 .  2)  is  bounded. 


§ 4.  The  balanced  reduction  of  deficits 


We  suppose  ourselves  at  the  barycenter  at  stage  N, 


N  N 

and  that  the  .  6j  .  6  satisfy  the  ratio  condition  (3.28). 

Our  objective  is  to  move  in  such  a  way  as  to  guarantee  a 
decrease  in  6  and  to  preserve  (3, 27)  . 


The  situation  is  substantially  more  complicated  than  it 
is  in  Karmarkar's  algorithm.  He  knows  that  there  is  a  solution 
somewhere  in  the  simplex,  and  not  as  far  as  one  unit  away. 
Hence  there  is  at  least  a  certain  rate  of  decrease  in  the  direction 
towards  the  solution,  and  therefore  at  least  that  much  in  the 
direction  of  steepest  descent. 


What  we  need  is  more  than  just  the  existence  of  a  solution 
somewhere  in  the  simplex.  We  need  not  only  a  pull  downward,  but 
a  pull  towards  correcting  the  ratios  when  they  begin  to  diverge.  What 
we  need  is  to  know  the  existence  of  points  which  are  well,  but  not  too 
well,  down,  and  have  all  the  deficits  equal. 

N  N  N  N  N  N  rr 
To  do  this  ,  we  go  to  the  point  5  =  (X  ,Y  ,U  ,V  ,W  )  £  R  + 

at  hand,  find  the  largest  deficit,  and  adjust  the  variables  Uh  ,  V\  , 

and  W  corresponding  to  the  other  deficits  upwards,  so  as  to  make 

the  deficits  all  equal.  This  gives  us  a  point  H*  ,  whose  image 


in  we  denote  by  g;*  .  Now  there  are  two  possibilities.  We 

consider  the  easy  one  first. 


That  is  the  case  when 

6  N(  » *)  <  5(f  )/2  . 


(4.1) 


In  this  case  we  move  towards  ,  either  to  the  (approximate) 

N 

minimum  of  cp  (-)  or  to  5*  ,  whichever  comes  first.  In 
this  case  the  ratios  will  be  brought  back  towards,  perhaps  all 
the  way  to,  unity.  And  there  will  be  a  guaranteed  proportionate 
decrease:  see  §  5  . 


The  more  difficult  case  is 


N  N  - 

5  (?*)  >  5  (?)  /  2  . 


(4.2) 


We  now  drop  the  supe  rsc  ript  N  and  write  6.  ,  6j  ,  5  for  5-(?), 

6j(^)»  6(?)  .  Now,  on  the  assumption  that  the  original 

problem  has  a  solution,  there  is  a  point  in  I  with  6^,  6j  , 

and  6  all  zero.  Consider  the  line  [  »#,  ?**]  .  There  is 
obviously  on  that  line  a  point  5***  satisfying 


6  •(§***)  =  5j(?***)  =  5(?***)  =  6/2 


(4.  3) 


for  all  i,  j  .  It  will  be  this  unknown  point  we  will  try  to  aim 
at,  not  the  solution. 

Consider  the  line  segment  l  =  [  5  ,  (?***]  .  Denote 
the  length  of  {,  by  d  ,  and  derivatives  along  it  by  primes. 
Then 


-  6  +  d(6!  -  6')  =  0  . 


(4.4) 


Also, 


dV  =  -6/2 


(4.  5) 


On  eliminating  the  d  we  get 


5!  =  6  '(2  5  .  /  6  )  -  1)  . 


(4.  6) 


Similarly  we  derive 


6'  =  6  '(2  6  /  5  -  1)  .  (4.  7) 

j  3 


So:  we  have  proved  that  in  the  case  (4.  2)  there  exists 
a  direction  respecting  the  p  constraints  (4.6),  the  q  constraints 


(4.  7),  and  the  simplex  equation  in  (3.  3),  and  at  the  same  time 
bringing  down  5  at  a  rate  exceeding  6/2  . 

What  we  will  do,  knowing  the  above  fact,  is  to  set 
in  motion  the  steepest-ascent  routine  of  [l]  ,  asking  for  the 
steepest  ascent  of  -6  which  respects  the  constraints  (4.6), 

(4.  7),  and  the  simplex  constraint.  This  routine,  being 
iterative  and  approximate,  will  never,  except  by  remote  chance, 
ever  achieve  any  of  those  equations  exactly.  But  we  do  not  need 
exactness,  except  in  the  simplex  constraint.  At  each  step  of  the 
iteration,  we  will  "plaster"  the  current  direction  vector  y'  onto 
the  simplex  by  subtracting,  from  each  component,  the  average 
of  the  components,  and  then  renormalizing.  We  then  test  the 
inequality 


-5'  >  |h|/2  ,  (4.8) 

in  which  6'  is  the  rate  corresponding  to  the  plastered  y  ,  and  h 
is  given  by  (3.1)  in  [l]  .  This  will  assure  that  the  plastered 
vector  is  yielding  more  than  half  the  maximum  rate  of  descent 
subject  to  the  constraints;  we  explain  this  at  the  beginning  of 
§5  .  If  (4.8)  passes,  we  test  the  inequalities 

6|/6'  €  [2^/6"  -  5/4,  Zfi".  /  6*  -  3/4]  (4.9) 


and 


(4.  10) 


51/  6'  €  [  2575  -  5/4  ,  25  76  -  3/4]  . 

We  will  accept  the  candidate  plastered  direction  if  all  these 
inequalities  are  satisfied.  Suppose  an  i  is  accepted  with 

6 ! /  6 '  =  2575  -at  (4.  11) 

where  a  €  [3/4,  5/4]  .  Suppose  one  moves  in  the  candidate 
direction,  with  5'  =  -s  .  Then  6!  =  -s(26./6  -  a  »  so 
that  at  a  given  distance  d 

5^(d)  =  5^  -  sd(2576  -  a)  .  (4.12) 

Choose  d*  (which  might  be  greater  than  the  distance  to  the  boundary) 
so  that  sd*  =  5/2  .  Then  5^(d:;;)  =  aS  /2  ,  i.  e. 

5.(d*)/5(d*)  =  a  .  (4.13) 

Since  the  ratio  5.  (d)  /  5(d)  is  monotone,  and  since  it  lies  in  [3/4,  5/4] 
at  both  d  =  0  and  d  =  d*  ,  it  therefore  lines  on  that  interval  for  any 
d  e  [o,d*]  .  The  same  obviously  goes  for'any  j.  It  follows  that  the 
ratio  condition  (3.27)  will  be  respected  by  any  move  that  does  not 
reduce  5  by  more  than  half.  a 


§  5.  Estimation  of  the  improvement  on  a  move 


The  purpose  of  this  section  is  to  estimate  the 

N 

guaranteed  proportionate  reduction  in  ip  (5)  on  a  move  of 
one  of  the  two  types  described  in  §4  ,  and  to  discuss  step 
size. 


We  have  first  to  discuss  the  effect  of  the  "plastering" 
of  the  direction  vector  to  the  simplex,  and  the  meaning  of 
the  test  (4.  8)  ,  both  occurring  in  the  "difficult"  case  (4.  2). 

As  to  the  plastering,  for  any  positive  £  the  routine  of  [l] 
will  eventually  produce  a  y'  with  |  s  |  <  £  ,  where 

m  + 1 

3  =  Z.  Yk  ’  (5<1) 

k  =  1 

Subtracting  off  s/(m+l)  from  each  y^  gives  us  a  vector 

1/2 

y"  satisfying  |y"  ~y'|  <e/(m+l)  '  .  And  the  length  of 

v"  differs  from  unity  by  less  than  3£^/2(m-Ll)  .  So  the 

1  /2 

plastered  y  differs  from  y1  by  only  about  £/(m+l) 

Hence  the  additional  conditions  (4.9)-(4.10)  will  certainly 
eventually  be  met.  The  operation  of  plastering  is  therefore 
almost  free  of  charge. 


We  now  come  to  the  question  of  the  meaning  of  the 

test  (4.  8)  .  We  need  to  have  a  relationship  between  the  rate 

of  descent  X  (wemean\>0)  provided  by  the  plastered  y 

—  N  —  0 

and  the  current  value  of  5=6  (?)  .  Denote  by  X  the 

exact  steepest  rate  of  descent  for  the  constrained  problem. 

We  do  this  in  three  steps.  We  first  relate  X  to  the  X'  provided 
by  the  unplastered  vector  y'  •  This  is  accomplished  by  (4.  8), 
which  tells  us  that  X  >X'/2  .  We  then  relate  X  to  xV  and 
finally  X^  to  6  . 

As  to  the  second  relation,  it  is  a  characteristic  of 
the  steepest  ascent  algorithm  that  always  X'  >  X^  .  This  we 
proved  at  Lemma  D  in  §4  of  [l]  .  What  happens  in  that 
algorithm  is  that  the  current  value  comes  down  as  the  deviations 
from  the  constraints  decrease. 

As  to  the  final  relation,  recall  that  in  this  case  there 
exists  a  -***  satisfying  (4.  3)  .  6  decreases  by  exactly  5/2 

on  the  segment  [  »  ,  ?***]  .  And  the  length  of  that  segment  is 
less  than 


R  =  /m/(m  +  1) 


(5.  2) 


the  radius  of  Karmarkar's  circumscribed  sphere.  So  the  rate 
of  descent  on  that  segment  exceeds  6  /2R  Now  the  direction 
of  that  segment  satisfies  by  definition  the  constraints  of  the 
steepest-ascent  problem.  It  follows  that  \^>  5  /2R  .  We 
therefore  have 

X  >F/4R  ,  (5.  3) 

which  is  the  relation  we  need. 

That  explains  the  test  (4.8),  in  the  case  (4.2)  .  In 
the  other  case  (4.  I),  the  direction  we  have  chosen,  the  one 
pointed  straight  towards  ,  takes  6  down  by  at  least  5  / 2 

in  a  distance  less  than  R  .  The  X  for  that  direction  therefore 
exceeds  5  /2R  .  Hence  (5.  3)  holds  in  that  case  as  well.  □ 

The  rest  of  this  argument  is  simply  a  ve ry -much- reworked, 
simplified,  and  in  some  respects  amplified,  version  of  Karmarkar's 
derivation  of  the  corresponding  estimate  in  [2]  .  In  particular  we  go 
1/4  of  the  way  across  the  inscribed  sphere. 

We  have  first  to  write  out  the  simple  properties  of  the 
geometric  mean  function  of  (3.9)  in  the  direction  of  the  move. 


G(t)  =  g{%  +  yt) 


(5.4) 


Then 


G(t)  =  h(t)  . 


(5.  5) 


whe  re 


We  get 


whe  re 


m  + 1 

»«'  *  L  rfe 

k  =  1 


h(t)  =  -  k(t)  , 


m  + 1 


y  k_ 

Jki 


So  h(0)  =  0,  and  h(t)  is  strictly  decreasing.  Next  , 


e<*>-  -•>“»  • 


(5.  9) 


By  the  Schwarz  inequality,  the  quantity  in  brackets  is  nonpositive, 
and  it  can  be  zero  only  if  y  tor  all  k=l,...,m  +  l  ,  a) 

being  some  proportionality  constant.  Since  the  y^  sum  to  zero 
and  the  5  (t)  to  unity,  we  would  then  have  ju  =  0,  80  that  all 

the  y,  are  zero,  impossible  because  y  is  a  unit  vector.  Hence 

K 

the  quantity  in  brackets  is  in  fact  strictly  negative,  so  that  G(t) 
is  strictly  concave. 


r  =  yi/m(m  +  1) 


(5.10) 


This  is  the  radius  of  Karmarkar's  inscribed  sphere. 


We  need  an  estimate  for  h(t)  ,  on  the  interval  [0,  r/4]  . 
We  assume  that  m  >  2  .  Then 


'*<*>  >  ^ 1 


4ym(m  + 1) 


3(m  + 1) 


(5.  11) 


Hence 


m  + 1 


k(t)  <  |  (m+1)2  ^  y2 


=  j  (m  +  l)2 


(5.  12) 


It  follows  that 


0  <  -h(t)  <  (m  +1)2 


(5.13) 


on  [0,  r/4]  .  This  is  the  desired  estimate. 


Now  put 


6  ( g(t>> 
G(t) 


(5.14) 


This  is  the  objective  function  ;p(?)  ,  taken  along  the  line.  Then 


*<*>  =  -  CTO  tx  +  »(.)]. 


(5.15) 


Using  (5,  3)  and  (5.13),  we  estimate  the  second  term  in  brackets: 


0  <  -  hit)  < 


Hence 


5  4R\  9r  ,  ,  ..2 

m+I  <  m  +  1  x  16  m  + 1 


(5.16) 


* (t)  <  -  Te"  x  gTo 


(5.  17) 


Putting  this  together  with  the  definition  (5.  14)  and  using  (5.  3) 


once  again,  we  get 


IX 

'  16 


7 

"  64R  * 


(5. 18) 


[  log  $(t)  ]  < 


1 

x  4R\  ~ 


On  integrating  this  across  [0,  r/4]  ,  we  get 
<f>(r/4)  <  $(0)e"7/256m  . 


Hence  we  would  have  (3. 18)  and  hence  (3.  26),  with  uu  =  e*7/256m  ^ 
if  we  were  to  choose  *'  =  ^(r/4)  .  We  have  thus  achieved  the 
desired  guaranteed  proportional  reduction  in  the  surrogate 
function  $  .  C 


Move  length  .  It  would  be  absurd  to  make  the  above 

# 

choice  of  £  in  computational  practice,  Karmarkar  in  [2],  and 
I  here,  chose  it  only  to  obtain  the  estimate  (5.  13)  for  h(t)  and 
hence  the  guarantee  (5.16)  .  In  fact  the  function  <J>(t)  of  (5.14)  , 
as  the  quotient  of  a  linear  function  and  a  strictly  concave  function, 
has  a  unique  minimum  on  the  line,  which  can  be  many  small 
sphere  radii  out.  It  is  a  trivial  exercise,  requiring  very  little 
computation  compared  to  that  required  in  case  (4.  2)  to  obtain 
the  direction,  to  obtain  an  approximate  minimum  using  a  few 
steps  of  a  halving  process.  It  is  by  such  a  halving  process  that 
I  propose  to  choose  Z #  *)  . 

*)  (.Footnote  next  page] 


.19) 


[Footnote  to  previous  page] 

*)  Karmarkar  had  not,  until  I  caLled  his  attention  to 
it  in  October  1984,  yet  noticed  the  unique  minimum  property 
of  <5(t)  .  Perhaps  his  use  of  the  logarithmic  version  obscured 
that  property.  He  had  been  using  a  "three -point  test"  . 


§  6.  Conve  rgence 


The  principal  objective  of  this  section  is  to  prove 
the  following  assertion. 


LEMMA.  If  the  set  of  classical  solutions  of  the 
original  LP  problem  (2.1)-(2.2)  is  nonempty  and  bounded, 
then  the  ''geometric  mean"  term 

,_N  _  N  l/(m  +  1) 

U1  *  “•  *  am) 

appearing  in  the  estimate  (3.26)  is  bounded. 

PROOF.  We  will  prove  that  if  that  term  is  unbounded, 
then  there  exists  an  unbounded  solution  set  for  the  original  LP. 


Suppose  then  that  the  quantity  (6.  1)  tends  to  infinity 
on  a  subsequence  of  the  N.  Then  it  will  eventually  exceed  unity 
and  therefore  be  less  than  the  true  geometric  mean 
l  •  .  . .  •  )  .  oince  trivially 


_N  1/m 
~m  — 


'“I  •  '  *  “m 


m 


M 


i 


(6.1) 


(6.2) 


we  may  replace  the  factor  (6.1)  in  (3.26)  by  S/m  ,  where  we 
have  written  S  =  +  ...  +  Recalling  the  definition  (2.5), 

we  then  have 


0<  -)  C.x'N  +  S  B.y'N  +  w'N  <  AijNA(H0)/m,  (6. 

L  ii  L  j  j 
i=l  j=l 

where  we  have  written  H  ^  =  5^/S  and  A  =  ( •  •  •  •  •  +  ^  , 

Since  always  A^»  Aj  <  (5/4)  a  .  we  get  similar  estimates  from 
(2.  3)  and  (2.4)  ,  with  an  additional  factor  5/4  on  the  right 
hand  side.  Because  x  €  (0,1)  and  is  constant,  and  because 
everything  else  on  the  right  hand  side  of  (6.  3)  is  constant,  the 
term  between  inequality  signs  in  (6.  3)  tends  to  zero.  This  is 
so  also  for  the  estimates  obtained  from  (2.  3)  and  (2.4)  .  Since 
E  ^  is  on  the  unit  simplex  in  ,  a  subsequence  converges 

to  a  limit  3*  ,  also  on  the  simplex.  We  thus  find  that 


)  A..Y* 
L  ij  j 
j  =  l 


U?=  ,  1  =  1, 


from  (2.  3)  , 


)  A..X*  =  V?  ,  j  =  1,  ...,q 

L  ij  i 

i  =  1 


from  (2.4),  and 


C«X*  =  B*Y*  +  W* 


(6.  6) 


from  (2.  5) 


also 


m 

V 

N  : 

L  ' 
k  =  l 


L 

i  =  1 


X* 

i 


i 

I 


Y  ^ 

i 


L 

i  =  i 


u* 

i 


M 

I 


v* 

j 


+  W*  =  1 


(6.  7) 


Now  suppose  that  the  pair  (X,  Y)  is  a  solution  of  the 
original  LP  (2.1) -{2.  2)  .  Filling  in  with  slacks  U.  and  V.  ,  we 
may  write 


l  A..X. 

U  1J  1 

i  =  1 


q 


j  =  i 


+  V.  =  B.  , 
J  J 

-  u.  =  c.  , 

l  l 

B.Y  =  c*x 


j  -  1, ....  q  t 


l  =  l, . .  . ,  p  » 


(6.  8) 


(6.  9) 

(6.10) 


Now  multiply  the  equations  (6.  8)  by  Y r  and  add.  Taking  account 
of  (6.  4),  we  get 


B.Y-  =  X.U*  +  V •  Y*  . 


Next,  from  (6.9)  and  (6.5),  we  get 


(6.  11) 


C«  X* 


.  V*.Y  -  U.X*  . 


(6. 12) 


Putting  these  together  with  (6.  6),  we  get 


W*  +  X«U*  +  U«X*  +  Y  •  V *  +  V*Y*  =  0  . 


(6. 13) 


Since  W*  and  all  the  dot  products  in  (6.  13)  are  nonnegative,  then 
they  are  all  zero.  Hence,  in  particular,  from  (6.11)  and  (6.12), 


X*  =  B«Y*  =  0  . 


(6.14) 


Now  it  is  not  possible  for  all  the  X*  and  Y*  to  be  zero. 

i  J 

If  this  were  so,  we  would  have  all  the  U:?  eoual  to  zero  from 
(6.  4)  and  all  the  V*  equal  to  zero  from  (6.  5)  .  And  we  have 
just  proved  that  W*  =  0  .  This  would  then  contradict  (6.  7). 


So  suppose  that  some  of  the  Xjj5  are  nonzero,  and 


w  rite 


Xr  =  X  +  tX*  ,  t  >  0  . 


(6.15) 


Then,  by  (6.  8)  and  (6.  5)  , 


L  Aijxi  *  Bj  '  j  = 

i  =  1 


(6.16) 


Hence  X^  is  primal -feasible.  From  (6.10)  and  (6.14), 


C  •  Xr  =  C «  X  =  C  •  Y  . 


(6.  1 


It  follows  that  the  pairs  (X^,  Y)  ,  t  >  0  ,  form  an  unbounded 
solution  set  for  (2.l)-(2.2)  .  We  make  the  corresponding 
construction  if  it  is  that  some  of  the  Y*  are  nonzero.  The 


J 


lemma  is  proved. 


The  above  proof  by  contradiction  does  not  give  any 
indication  of  what  the  bound  on  that  "geometric  mean"  in  (3.26) 
is.  But  it  does  in  fact  provide,  by  asserting  that  some  bound 
exists,  a  polynomial  estimate  of  the  number  of  steps  of  the 
outside  algorithm  required  for  given  accuracy. 


We  however  do  not  have  a  polynomial  estimate  for  the 

number  of  arithmetic  operations  required  to  achieve  given 

accuracy.  This  is  because  the  inside  algorithm,  the  steepest- 

ascent  algorithm  of  [l]  whose  role  is  described  in  §4  above,  has 

an  estimate,  at  (4.24),  involving  the  eigenvalues  of  a  matrix 

_N 

depending  on  2  .  We  do  not  even  know  that  the  whole  series  ] 
converges  (unless  the  solution  is  unique)  .  So  we  have  no  estimate 
of  the  smallest  eigenvalue  a  appearing  there.  All  we  know  is  that 


that  routine  yields,  in  finitely  many  steps,  any  desired  degree 
of  accuracy.  So  we  will  state  the  overall  convergence  result, 
a  consequence  ot  the  estimate  (4.24)  in  [l]  and  of  the  Lemma, 
very  simply  as  follows. 


THEOREM.  The  algorithm  of  this  paper,  which 
incorporates  the  steepest-ascent  algorithm  of  fll,  will  answer 
to  the  objective  set  at  (2.7)  above  in  finitely  many  arithmetic 


§7.  The  routine 


This  section  is  intended  to  provide  a  summary  guide 
to  the  practical  execution  of  the  program. 

We  suppose  available  the  steepest-ascent  algorithm 
of  [l],  whose  action  is  described  there  in  §§2,3  .  In  that 

algorithm  there  is  an  objective  vector  1  ,  defined  in  some 

K  j  K 

R  ,  and  constraints  b  ' ,  also  defined  in  R  ,  with 

f  =  1, .  .  .  ,  L  .  The  original  problem  it  is  cone  'rned  with  is 

that  of  maximizing  i»y  subject  to  the  constraints  |y|  =  1  , 

b^  •  Y  =  0  ,  l  =  1, . . .  ,  L  .  Suppose  that  maximum  is  v. 

The  routine  finds  an  approximation  to  the  solution  of  this 

problem,  in  the  following  sense.  Given  g  positive,  it 

either  states  that  v  <  g  ,  or  else  produces  a  unit  vector 

V  ’  €  R  satisfying  a  •  y'  >  v  and  \bL  •  y'|  <  g  ,  l  -  1 . L  . 

The  main  routine  is  executed  relative  to  strictly 
positive  vectors  H  =  (X,Y,U,V,W)  g  Rm  =  RPxRqx  RPxRqxRl  . 
It  begins  at  a  starting  point  =  (X^,  Y®,  U®,  W®),  determined 

as  follows.  X^  and  Y*^  are  chosen  arbitrarily,  with  all  the  X^  and 
Y*^  strictly  positive.  For  instance  one  might  choose  them  to  be 


all  equal  to  unity.  The  U?  ,  V?  ,  and  W^  are  then  chosen  to 
be  strictly  positive  and  such  that  all  the  deficits  A°  =  aTe°)  , 
A?  -  ATE0)  .  and  A^  -  A(E°)  defined  at  (2.  3) -(2.  5)  are 
strictly  positive  and  equal.  The  routine  is  now  ready  for  the 
execution  of  step  0  . 


At  the  outset  of  step  N,  N  >  0  ,  the  routine  is  at  a 
N 

point  i  ,  all  of  whose  components  are  strictly  positive,  and 

N  N  N  N 

for  which  the  deficits  A-  =A-(~  ),  A-  =A-(E  )»  and 

i  i  j  j 

N  N 

A  =  A(E  )  are  all  positive  and  satisfy  (2.  6)  .  The  problem 

is  to  find  a  direction  for  a  move  on  the  corresponding  simplex 

N 

E  .  We  now  have  to  treat  the  two  possibilities  considered  at 
the  beginning  of  §4  . 


We  begin  by  finding  the  largest  of  the  deficits  A^  , 

N  — 

Aj  »  A  at  hand.  Suppose  its  value  is  A  .  We  then  put 


U?  =  U.  +  a  -  A^»  i  =  1, . .  .  ,  p  :  V*  =  V .  +  a  -  aN  »  j  =  1, .  .  .  ,  q  : 

i  i  i  r  )  J  J 

w*  =  w  +  J  .  an  . 


At  the  point  E*  =  (XN,  YN,  U*,  V*,  W*)  the  deficits  are  now  all 

N  N 

equal.  Denote  its  image  in  I  under  T*  by  §*.  (Recall  that 

N  0  N 

T  is  given  by  (3.  1) -(3.  2)  with  E  replaced  by  2  .)  Now  test 


the  inequality  (4.1)  .  If  it  passes  (which  seems  in  general 
unlikely)  ,  we  get  an  essentially  free  move.  We  take  y  to 
be  the  direction  pointing  to  ,  i.  e.  y  =  (  » *  -  e  )  /  j  £  *  _  *  |  f 
and  pass  to  the  description  of  the  "Move  on  the  simplex",  below 


If  (4.  1)  fails,  we  are  still  guaranteed  a  balanced 
reduction,  but  at  substantially  greater  cost.  The  algorithm 
calls  the  ste epest-ascent  routine,  operating  in  the  space 
R  =  R  M  of  variables  ~  =  (x,  y,  u,  -j,  w,  z)  .  The 

objective  vector  2  is  now  read  off  from  (3.7),  with  0  replaced 
by  N,  as  follows.  We  put 


2k  =  Ck  ^k  «  k  =  !■» .  .  .  ,  p  ;  2^  =  -  ^ ,  h=p  +  l,.,.,p+q; 


(7.2) 


2.  =  0,  k  =  p  +  q  +  1 , . ,  .  ,  2p  +  2q  ;  a 


2p  +  2q  +1 


=  0,a 


2p  +2q  +  2 


There  are  L  =  p  +  q  +  1  constraint  vectors  b'  ,  whose  construction 
requires  some  intermediate  definitions,  as  follows.  We  first 
define  vectors  c1  6  R2P  +  2cl  +  2(  i  =  1, .  . .  ,  p  ,  read  off  from  (3.  5): 


c,  =  G  .  k  =  l, 
k 


1  IN 

p;  c.  =  -  A  Y  ,  k=p  +  l,...,p+q 

k-p  i,  k  -  p  k  -  p  r  '  r  = 


k  =  0  ,  k  =  p  +  q  +  1, 2p  +  2q  +  l  ,  except  that  c^  +  ^  + ; 


C2p  +  2q  +  2  ^i 


c 


(7.  3) 


Next  we  define  vectors  d^  f  R“P  ,  j  -  1,  .  .  .  ,  q  ,  read  off 


from  (3.  6): 


d^  =  A^.  ,  k  =  1, . . .  ,  p  ;  =  0  ,  k  =  p  +  l,...,2p+2q  +  l  , 


except  that  di  ,  ,  .  =  V  .  ;  dl  ,  „  ,  „  =  ~B. 

- -  2p+q+j  3  2p  +  2q+2  j 


Next  we  define  scalars 


(7.  4) 


,  N.  N  ,  .  , 

K .  =  2  /  L 


_  N  ,  N 

P  :  Hj  =  2  £  /£  -  1  ,  j  ^  1, . .  .  ,  q  . 


Finally  we  define 


b*'  ~  c  -  M.t  i  ,  K  -  1,  •  • . ,  p  ;  b>J  -  d*'  p  -  x)  2,4  =  p  +  l,. ..,p  +  q. 

(7.  6) 

These  correspond  to  the  constraints  (4.  6) -(4.  7);  see  the  text  following 
those  formulas.  The  final  constraint  vector  has 


7v+Cl  +  l  =  1  •  k  =  1, .  .  .  ,2p  +  2q +1  ; 


it  is  the  simplex  constraint. 


Each  iteration  of  the  steepest-ascent  algorithm  produces 
a  unit  vector  v1  ,  generally  speaking  not  satisfying  any  of  the 


constraints.  We  "plaster"  it  to  the  simplex  by  subtracting, 
from  each  component,  the  average  of  those  components,  and 
then  renormalizing.  We  denote  the  plastered  unit  vector  by  y  . 
We  first  test  the  inequality  (4.  8),  which  here  reads 


1  *Y  >  |h| /2 


(7.  8) 


where  h  is  the  vector,  used  in  the  construction  of  y'  ,  given 
by  (3.1)  in  [l]  .  We  then  make  the  tests  corresponding  to  (4.  9) 
and  (4.10)  of  this  paper.  They  are: 


b^  •  y  6  [  Hj  -  1/4  ,  +  1/4  ]  ,  t  =  1, .  . .  ,  p  ; 


(7.  9) 


b  *  •  Y  6  [  .  p  ~  1  /4  ,  x|  _  p  +  l  /4  ]  ,  4=p  +  l, .  . .  ,  p  +  q  . 


If  any  of  these  test  fails,  the  steepest-ascent  algorithm  proceeds 
to  its  next  iteration.  According  to  the  theory  (see  §4  ,  especially 
following  (4,  7),  eventually  all  the  tests  must  pass,  and  we  have 
the  desired  direction  for  a  move  in  the  simplex,  bringing  down 
cp(?)  =  6(?)/j(f)  and  respecting  the  requirement  (3.27)  on  the 
condition  that  the  move  is  of  length  not  over  1/2  . 


The  move  on  the  simplex.  Whether  we  are  in  case 


(4.  1)  or  (4.  2),  we  now  have  a  direction  y  of  guaranteed 
balanced  reduction  on  the  simplex.  We  seek  the  approximate 
minimum  for  the  function  $(t)  of  (5.14)  .  We  know  from 
(5.18)  that  this  function  is  still  decreasing  when  t  =  r/4  . 

So  we  set  the  initial  left  endpoint  for  the  halving  process  at 
a  =  r/4  .  As  to  a  right  endpoint,  there  is  first  the  boundary. 

Next,  in  the  case  (4.1)  one  cannot  move  too  far  beyond  5*  , 
precisely  not  beyond  f  +  2(<J*-  *)  =  *  +  2i»*  ,  without 

running  a  chance  of  violating  the  balance  condition  (3.  27)  .  In 
the  case  (4.  2)  one  cannot  move  more  than  1/2  for  the  same 
reason.  If  the  effective  upper  stop  is  now  still  the  boundary, 
one  uses  a  halving  process,  starting  with  left  endpoint  at 
r/4  and  right  endpoint  at  the  boundary,  to  find  a  point  where 
4>(t)  >  0  ,  and  puts  b  equal  to  that  t.  If  in  case  (4. 1)  it  is 

that  is  at  the  stop,  we  test  the  sign  of  $(t)  .  If  that  is 
negative,  we  put  5  =  and  pass  to  the  projective  transformation. 

If  it  is  negative,  we  put  b  =  and  pass  to  the  halving  process. 

If  in  case  (4.2)  the  upper  stop  is  d/2  ,  we  proceed  similarly. 

# 

At  this  point,  if  we  have  not  found  §  from  either  of 
the  two  special  cases  noted  above,  we  have  a  lower  limit  a  =  r/4  , 
and  an  upper  limit  b,  for  the  minimum.  We  then  begin  the 


I 


halving  process,  testing  the  midpoint  and  at  each  step  redefining 
a  or  b  .  If  in  the  unlikely  event  that  we  find  a  point  where  *(t)  =  0 
the  process  stops  .  Supposing  it  does  not  stop  that  way,  we  run 
it  a  few  iterations,  perhaps  ten  or  twenty  ;  there  is  no  reason 
to  seek  high  precision  here.  Then  either  the  *  at  the  stop, 

J± 

or  the  §  corresponding  to  d  =  (a+b)/2  will  be  taken  to  be  . 


>rojective  transformation  .  This  is  very  easily 


executed.  One  simply  uses  (3.11),  with  N  replaced  by  N  +  l  , 
_N  +1  N  # 

to  define  H  from  £  and  ^  .  One  is  then  ready  to 


execute  step  N  +  l  . 


The  algorithm  terminates  when  the  criterion 
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An  Experimental  Approach  to  Karaarkar'a  Projective  Method 
for  Linear  Programing 
by 

J.A.  Tomlin 


Introduction 

A  September  1984  article  in  Science  (13]  made  breathless  claims  for 
the  speed  and  efficiency  of  a  new  linear  programming  (L?)  algorithm  devised 
by  M.  Karmaricar  [10].  Furthermore,  claims  were  made  chat  numerical  results 
prove  the  new  mechod  up  to  30  times  faster  than  the  simplex  method. 
Unfortunately,  no  information  on  the  test  problem(s)  or  experimental 
procedures  were  given.  Only  limited  details  of  the  new  implementation  have 
so  far  been  discussed  publicly. 

At  first  sight,  the  prospects  for  such  a  new  algorithm  are  not 
bright.  Thirty-five  years  of  attempts  to  defeat  the  simplex  mechod  [2] 
have  met  with  uniform  lack  of  success,  the  most  recent  being  the  Scolnik 
and  ellipsoid  fiascos.  This  generally  gloomy  outlook  is  complicated  by  two 
further  factors  implicit  in  Karmaricar '3  approach: 

(1)  Like  the  ellipsoid  method.  It  cakes  the  unpromising  step  of  non- 
linearizing  a  linear  problem. 

(2)  At  least  initially,  it  makes  the  entirely  false  assumption  chat 
any  feasible  L?  constraint  sec  possesses  a  strict  interior  (i.e., 
a  solution  positive  in  every  component).  Many  real  L?s  are  highly 
degenerate  (17], 

It  seems  that  the  only  way  to  evaluate  these  conflicting  viewpoints  is 
to  perform  some  computational  experiments  on  reasonably  representative  L? 


1 


models  and  extrapolate  Che  reaules.  This  paper  reports  on  the  results  of 
such  a  sec  of  experiments.  The  next  section  reviews  the  steps  of  the 
algorithm  as  outlined  In  (10].  The  following  sections  describe  some 
additional  constructive  scepe  for  getting  started,  the  implementation, 
experimental  results  and  conclusions. 


where  e  is  an  n-vector  of  l's,  A  is  m  *  a,  and  the  minima  z  is 
assumed  to  be  0.  It  is  also  assumed  that  there  exists  a  solution  to  (1) 
with  the  property  that  >  0  (j  •  1,  ...»  n). 

Under  these  assumptions,  let  D  •  dlag  (x^,  ....  xq)  and  employ  a 
projective  trans forma  cion  and  its  inverse,  defined  by: 


2 


D~*x 

•V1* 


Dx' 

•TDx' 


(2) 


The  second  of  these  transforms  Che  LP  on  a  simplex  (1)  inco  che  fractional 
program  on  a  simplex  in  x’ -space: 


subject  to 


Min  z 


cTPx ' 
eTDx’ 


(3a) 


ADx’  -  0 


(3b) 


and 


T 

e  x 


x’  >  0 


(3c) 


Noce  chat  from  che  definition  of  D,  che  point  x  in  x-space  is  mapped 

onto  the  point  (1/a . 1/n)  in  x’ -space.  The  general  idea  is  now  to 

ignore  che  denominator  in  (3a),  and  cake  a  "large"  improving  step  away  from 
che  center  of  che  siaplex  to  a  new  point  in  x1 -space.  This  is  transformed 
back  into  x-space  and  che  result  evaluated. 

Specifically,  che  algorlcho  generates  a  sequence  of  points  x^ *  , 
x  ,  ...»  x  ,  where  Xj  >  0  ( j  ■  1 ,  . . . ,  n)  as  follows: 

Gt)  He) 

1.  Define  D  ■  diag  (x^  ,  ...,  x^  ' ) ,  and: 


3  - 


i 


3 


2.  Compute: 


c  -  II  -  »T(B»T)“lB]  Dc  ,  (4) 

P 

(i.e. ,  project  an  ''ascent''  direction  for  problem  (3)  into  the  null-’ 
space  of.  its  constraint  matrix  3). 

3.  Normalize  c  and  scale  it  by  the  radius  of  the  largest  sphere  which 
P 

can  be  inscribed  in  the  simplex  (3c)  to  produce  the  direction  vector: 


(5) 


|c  |/n(a-l) 
P 


4.  Take  a  "descent"  step  of  length  a  to  a  new  feasible  point  for  (3): 


x'  •  ~  e  -  a  p  . 


(6) 


5.  Project  x'  back  into  x-space  to  obtain  the  new  point 


(k*l) 

x 


Dx' 

eTDx' 


(7) 


If  the  new  point  satisfies  the  termination  criterion,  scop.  Otherwise,  sec 
k  »  k+l  and  go  to  1. 
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Ic  should  be  clear  that  cha  great  majority  of  cha  work  la  each  itera¬ 
tion  coses  in  seep  2,  which  assures  feasibility  of  cha  sav  poise. 

2.  Getting  Started 

There  are  two  details  to  be  cleared  up  is  obtaining  a  first  feasible 
interior-point  solution  to  LP  (1).  First  of  all,  the  problaa  oust  be 
"homogenized. "  LP  problems  are  sore  commonly  expressed  in  a  canonical  form 
such  as: 


Min  z  »  c  x 


subject  to 


Ax  •  b 


(8) 


x  >  0  . 

Karmarkar  suggested  scaling  the  variables  by  some  plausible  upper 
bound  (say  a)  on  their  sum,  so  that: 


Ax  ■  5  ■  b/  c 


(9) 


e‘x  x  .  •  1 
n+1 


T- 

and  then  multiplying  b  by  e  x  ■  1  to  obtain 


( [ A , 0 J  -  be^)  x  ■  0 


(10) 


.TJ-1 


where  «  and  z  here  are  of  dloenaion  n+1.  Sine*  b  is  normally  such 
danaar  chan  Che  coluana  of  A  axplidt  conatruction  of  (10)  la  co  b« 
avoldad  If  poaalbla. 

0n«  aecapeabla  aethod  is  to  daflna  a  new  variable  corrasponding  to  cha 
rlghc  hand  sida  in  (9)  aftar  scaling,  and  forcing  this  variable  co  be  1: 


Ax  -  (b/o)5 
$ 


-  0  , 

-  1  , 


(11) 


T- 
ft  x 


Vi  ■ 1  • 


The  laac  cwo  conacrainea  nay  ba  replaced  by: 


•T;  - 5  +  Vi  ■ 0 


,T*  ♦  ?  ♦  V,  •  2  • 


Scaling  all  cha  variables  again  by  1/2  we  obtain: 


A 

-  b  /  <3 

0 

0 

T 

a 

-1 

1 

x  ■ 

0 

T 

e 

l 

1 

1 

(12) 


x  >  0 
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and  a  solution  co  (3)  nay  be  recovered  as  Xj  »  2c  x^  (j  ■  1 . a). 

Moca  cfaac  A  is  unaffactad,  but  a  single  dansa  row  is  addad. 

Karmarkar  also  suggested  tha  construction  of  an  artificial  column  to 
obcain  a  starting  intarior  solution  to  (1).  This  involves  adding  a  saw 
column  (say  d)  and  variable  Co  (lb)  and  (1c)  — 

Ax  +•  dX  -  0 

T 

a  x  +  V  *  1 

such  chat  e/(n+l)  is  a  feasible  solution,  and  then  minimize  X  co  zero 
(or  close  co  zero).  Using  the  construction  (12): 


and  retaining  che  convention  chat  A  is  a  *  a,  a  starting  solution  of 
e/(n-3)  implies  chat  d*b/o-Ae,  6  *  -n. 

For  chese  experiments  we  thought  it  betcer  co  start  with  a  different 
intarior  point,  '-fa  already  know  chat  che  variable  x^^  corresponding  to 
che  right  hand  side  ausc  be  1/2  in  a  feasible  solution  —  quite  distant 
from  l/(n*3).  In  addition,  che  slack  value  x  . , 


nay  be  substantial, 


in 


UNCLASSIFIED 


m. 


L 


microcopy  resolution  «st  chart 

"STw*- "  — s,a6-“4 


ft 


t 

b 


since  eh*  value  a  will  rarely  be  known  accurately.  Finally,  an  Initial 
value  of  l/(ff*3)  for  X.  gives  the  algoritha  very  little  to  "bite"  on, 
since  this  is  not  very  far  from  the  target  value  of  zero  (or  c)  for  even 
moderate  n. 

We  have  used  the  starting  solution: 


J  4n  * 


(J  •  1,  •••»  n) 


(14) 


*rrH  “  Xn+2  “  X  "  4  » 


which  gives  the  artificial  coefficient  values: 


This  has  proved  satisfactory  in  practice. 


3.  Implementation 

Io  initiate  these  experiments  the  author's  LPM1  Fortran  LP  code  was 
modified  to  give  a  test  bed  implementation  (LPMK)  of  Karmarkar's  algorithm 
for  sparse  LP  problems  of  moderate  size.  (We  had  previously  modified  this 
code  to  test  Scolnik's  approach  [15]). 

The  matrix  (A,  -b /a]  has  its  non-zeros  stored  contiguously  column¬ 
wise  in  arrays  IA(*),  A(*),  specifying  the  row  indices  and  values. 

Another  array  LA(*)  points  to  the  beginning  of  each  column.  Slack 
columns  are  stored  for  each  L  and  G  row  in  the  conventional  MPS  format 
[12],  but  no  logical  column  is  generated  for  an  E  (equality)  row.  All 


I 


I 
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free  rows  arc  dropped,  except  che  designated  cose  row,  which  is  scored  as  a 
full  vector.  All  of  these  arrays  are  constructed  as  the  L?  model  is  read 
in  MPS  format.  Note,  however,  ehat  bounds  and  ranges  are  not  allowed. 

The  artificial  column  is  constructed  immediately  after  input,  but  che 
two  dense  rows  (sr^l)  and  (**2)  are  never  stored  explicitly. 

As  we  have  pointed  out,  the  bulk  of  che  work  in  the  algorithm  is  che 
calculation  of  a  projected  vector  (4).  It  turns  out  chat  the  calculation 
(4)  is  equivalent  to  finding  che  residual  of  a  least  squares  problem: 

-  y 

y  •  arg  min  IDc  -  3  yJ^  (15a) 

y 

c  •  Dc  -  B*y  .  (15b) 

P 

There  are  several  methods  of  computing  this  vector  (see  [6,7,3]).  The 
method  chosen  for  this  implementation  is  che  popular  one  of  computing  a 
matrix  decomposition: 

QBT  -  [  q  ]  .  QDc  -  [  -1  ]  (16) 

X 

where  R  is  upper  triangular,  Q  Q  •  X  and  3  is  Initially  assumed  of 
full  rank.  cp  may  Chen  be  computed  via  y  *  R*1  and  (15b)  or: 

c  •  [  I  -  3  R  R3|Dc.  ( i ' ) 

P 

(Alternative  procedures  will  be  discussed  below.) 

In  our  initial  code,  Q  was  computed  as  a  product  of  Householder 
transformations: 
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Q  -  Hc  •••  H2Hl  , 


where 


and 


\  "  1  "  *k 


w0t)  w(k)T 


2iHk- 


1  . 


Each  transformation  eliminates  tha  elements  (W),  (lc+2) ,  ...  of 

T 

column  1c  of  •••  B  . 

T 

Soma  attaapt  to  exploit  structure  was  aade  by  permuting  B  in  the 
general  form  of  Figure  1  (sea  Saunders  [16]).  Unfortunately,  b  is 

usually  quite  danse,  and  d  often  completely  dense,  and  any  QR  decompo- 

T  00 

sitlon  of  B  results  in  a  completely  dense  &  as  well  as  dense  w 

vectors  to  be  stored  for  the  H^. 

The  calculations  involved  in  computing  the  descent  direction  in  (17) 
are  simply  routine  sparse  matrix  multiplication  and  forward  and  back  sub¬ 
stitution  with  R. 


4.  Experimental  Procedures 

Table  1  displays  the  characteristics  of  11  test  problems.  The  first 
two  are  merely  small  text  book  models.  GNET20  is  a  20  node  generalized 
network  (transportation)  problem.  STD23  is  the  "standard  23  row  refinery 
model"  used  for  many  years  as  an  illustrative  model.  AFIRO,  ADLITTLE, 
SHARE23,  ISRAEL,  BRANDY,  E226,  and  BANDM  are  realistic  models  which  have 
been  used  quite  extensively  as  test  problems.  Note  that  the  non-zero  count 
Includes  the  slack  columns  and  right  hand  side,  but  not  the  two  dense  rows 
and  the  artificial  column  d  in  (13). 


Table  2  gives  solution  date  for  Che  models  using  Kecron's  MPSIII 
system  (12].  The  times  include  initialization,  CONVERT,  SETUP,  solving  via 
WH1ZAR2)  and  SOLUTION.  The  time  quoted  is  for  the  entire  job  step  on  an  IBM 
3033/N.  The  number  of  "eta  non-zeros'*  given  refers  to  the  number  produced 
by  the  final  INVEST  before  the  solution  is  printed.  The  "Crashed"  columns 
are  made  basic  by  inspection  in  WHIZARD  before  beginning  the  simplex  Itera¬ 
tions. 

Two  important  parameters  are  needed  for  the  projective  algorithm: 

(a)  A  convergence  tolerance  oust  be  specified.  The  objective  functions  of 

the  test  problems  have  been  translated  by  their  known  optima  so  chat 

they  assume  their  minima  at  zero  (see  [10]).  All  the  problems  except 

AU LITTLE  and  ISRAEL  have  optimal  values  of  order  close  to  10^.  Those 

-3 

problems  had  their  objectives  scaled  by  10  to  bring  them  into  this 
range,  and  the  convergence  criterion  for  the  translated  optima  was 

•A 

arbitrarily  sec  to  10  for  all  runs. 

(b)  The  length  of  cne  step  a  must  be  specified  or  computed.  Karmarkar's 
paper  [10]  implies  that  a  should  be  some  constant  less  chan  1  (in 
fact  0.25  is  suggested  as  "safe").  Experiments  using  fixed  values  of 
a  for  two  problems  shoved  the  number  of  iterations  required  for  con¬ 
vergence  to  be  inversely  proportional  to  a  (see  Table  3).  An 
initial  (constant)  setting  of  a  ■  0.9  seemed  "safe,"  and  was 
adopted. 

Table  4  gives  solution  data  for  this  version  of  the  projective 
method.  Two  factors  ocher  chan  iteration  numbers  and  times  are  of  particu¬ 
lar  interest.  These  are  the  number  of  non-zeros  in  the  Q  and  R  factors 

T 

and  the  approximate  condition  of  BB  ,  which  is  critical  for  the  accuracy 
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of  th«  projection  seep  (4).  The  condition  of  &  is  approximated  by  caking 

the  raeio  of  Che  maximum  diagonal  element  f  {  and  the  minimum  [  | . 

T 

This  raeio  is  squared  to  approximate  *(BB  ).  In  general  ehe  number  of 
non-zeros  in  Q  and  R  decreases  throughout  ehe  run,  while  ehe  condition 
number  Increases  monoeonically. 

These  initial  results  show  three  important  trends: 

(a)  A  noc-unexpeceed  deterioration  In  the  condition  of  R  (and  hence 

T 

BB  )  as  Che  algorithm  progresses. 

(b)  A  disappointingly  large,  but  quite  slowly  growing,  number  of 
iterations  with  problem  size. 

(c)  Catastrophic  Fill-In  of  Q  and  R. 

Ill-conditioning  of  R  is  to  be  expected  as  diagonal  elements  of  0 
approach  zero  and  hence  B  approaches  column  rank  difficiency  (especially 
for  degenerate  models).  A  may  also  be  row  rank  deficient  initially,  since 
we  do  not  necessarily  have  a  full  see  of  unit  columns.  We  oust  therefore 

T 

have  a  method  of  rejecting  rows  of  A  (columns  of  B  )  for  elimination  In 

(16).  This  may,  of  course,  be  done  by  means  of  a  tolerance  on  acceptable 

•12 

values  of  |r11|.  A  value  of  10  is  used  here  for  all  elements  of  Q 
and  R.  Some  more  sophisticated  method  is  clearly  desirable. 

The  ill -conditioning  of  R  can  sometimes  be  overcome  simply  by 
avoiding  its  use.  M.A.  Saunders  suggested  using  the  formula: 
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(where  the  0  diagonal  matrix  is  of  the  row  dimension  of  B),  rachar  than 
(15b)  or  (17),  involving  only  cha  necessarily  wall-conditioned  Q.  Experi- 
aanc  shoved  chat  this  did  indeed  allow  convergence  eo  tighter  tolerances , 
chough  (18)  may  require  prohibitively  more  work  if  the  nuaber  of  non-zeros 
in  Q  is  touch  greater  chan  in  R  and  B. 

5.  Modified  Step  Length  Calculation 

The  nuaber  of  iterations,  as  we  have  observed,  tends  to  be  Inversely 
proportional  to  a .  There  is  in  fact  no  good  reason  to  restrict  a  to  be 
less  then  1,  except  for  proving  theoretical  complexity  results  [10].  What 
we  do  require,  from  (6) ,1s  chat: 


a  <  o  ■  min  — .  (19) 

?1  >  0  api 

(k+1)  (k) 

We  would  also  like  to  guarantee  chat  a  <  z .  This  may  not  happen 

in  some  circumstances.  In  particular  it  was  observed  chat  when  some  of  the 
problems  were  Infeasible  (due  to  input  errors)  and  a  was  fixed  (0  <  a  <  1) 
the  phase  I  objective  function  \  oscillated,  and  these  oscillations  did 
not  converge.  This  un-robusc  behavior  suggest  a  look  at  the  conditions 
under  which  we  expect  z^  to  improve. 

Defining 


r  (k) 

}  PJ 


p  (k) 

v  -  n  |  xj  Pj  , 


we  easily  see  from  (la),  (6)  and  (7)  chat 


(20) 
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(21) 


.(IcH) 

j 


*  aj  p 
-  a  v 


,(k+l)  Z^  ~  g  u 

1  -  a  v 


(22) 


Clearly  Z(^+D  is  oonotooic  In  a,  but  a  decrease  is  noe  guaranteed.  The 

(k+1)  (k) 

four  eases  which  do  satisfy  z  <  z  for  soae  range  of  a  ere: 

(k)  _  _ 

u  -  v  z  >0  and: 

(i)v>0-»0<a<  1/v, 

(11)  v  <  0  0  <  a, 

u  “  v  z(k)  <  0  and: 

(ill)  v  >  0  ^  <x  <  0, 


(Iv)  v  <  0  ^  1/v  <  a  <  0. 


The  oscillations  oust  be  due  to  cases  (ill)  or  (iv)  arising  when  we  insist 
chat  a  be  a  positive  constant. 

The  oscillating  case  may  be  avoided  by  setting  a  <  0  is  cases  (ill) 
and  (iv).  Note  chat  this  has  never  been  observed  to  happen  in  practice 
unless  a  model  is  Infeasible  or  the  constant  a  in  (9)  is  chosen  to  be  coo 
small. 

The  a  calculation  to  be  used  in  practice  is  then  to  specify  a 
multiplier  0  <  a  <  1  and  in  cases  (1)  and  (11)  above  sec: 


(i)  a  •  a  min  {$,  I/v} 


(v  >  0) 


(23) 

(ii)  n  •  a  $  (v  <  0) 

otherwise  ve  simply  put  a  -  -5  la  cue  (iii)  or  a  •  min  {-a,  1/v} 
la  cue  (iv). 

The  problem  of  fill-la  was  left  temporarily  la  abeyaace  aad  the  sec  of 
problems  la  Table  4  rerun  with  che  revised  seep  length  ealeulaclon  (23). 

The  number  of  iterations  la  Table  5  Is  reduced  by  a  factor  of  about  4 
for  the  larger  models.  There  does  seem  to  be  some  loss  of  accuracy  la  chat 
there  was  difficulty  in  attaining  convergence  for  problem  SHARZ2B.  This 
was  immediately  overcome  by  use  of  formula  (18). 

6.  Reducing  Fill-in 

The  presence  of  the  right  hand  side  column  b  in  8,  and  of  d  In 
phase  I,  almost  guarantees  complete  fill-in  of  R  (and  of  Q  within  che 
envelope  implied  by  an  ordering  such  as  seen  in  Figure  1).  This  cata¬ 
strophic  fill-in  can  be  avoided  by  Initially  omitting  che  offending  columns 

T 

from  3  (i.e.  ,  rows  from  3  )  and  using  an  "update"  to  compute  che  least 
squares  solution. 

If  B  •  (3j  Bj],  where  3^  contains  che  "nasty"  columns,  che 
procedure  is  to  form  che  modified  factorization 

Q  b[  •  R  , 

and  solve  a  modified  least  squares  problem  to  obtain: 
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t 


T 

argalalDjC^  - 


where  0  and  e  arc  partitioned  conformably  with  B. 

A 

There  arc  a  number  of  ways  of  modifying  this  solution  y  to  obtain 
the  solution  to  the  full  problem  (15)  (see  e.g. ,  [3]).  In  its  simplest 
form  this  updating  requires  us  to  calculate: 


F  - 


T  --1 

3,  R 


f  - 


D2£2 


4* 


A 

y  +  R 


FTv 


(24) 


where  v  solves: 


(I  +  FFT)v  -  f  .  (25) 

Ignoring  the  work  required  in  solving  (25) ,  this  calculation  Is  dominated 

.  x 

by  a  forward  substitution  using  R  for  each  row  of  B2  and  a  back 

X 

substitution  for  F  v.  The  assumption  is  that  the  total  work  will  be  less 
than  chat  using  QR  when  R  becomes  dense.  Mote  that  only  the  updated 
solution  y  for  (I5a)  is  obtained,  not  an  updated  factorization. 

7.  Modified  Phase  I 

In  addition  to  the  above  feature,  another  improvement  was  made,  which 
can  reduce  the  number  of  iterations  in  phase  I.  Karaarkar  has  pointed  out 
[li]  chat  X  need  not  be  non-negative  and  can  be  omitted  from  the  step 
length  calculation  (19).  If  the  a  chosen  results  in  X  <  0,  one  can 

(It)  (k+1) 

interpolate  between  x  and  x  to  find  an  x  such  chat 
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(It) 

X  *  0.  In  our  cerms,  using  \  for  x  in  (21),  and  leeeing  p^ 
correspond  co  che  component  of  p  for  ehe  artificial  column  d,  if 


1  -  0 
1  -  <rv 


<  0  , 


chan  rasec  a  co  l/(npd). 

8 .  Minimum  Degree  Ordering 

Ones  cars  is  taken  co  avoid  fill-in  in  &  (actually  R)  ic  makes 

sense  co  use  a  more  sophiscicaced  ordering  chan  chac  employed  so  far.  The 

machod  chosen  was  cha  “minimum  degree“  ordering  of  AA  (see  [4]),  since 

T  T  T  T 

cha  non-zero  scruccure  of  L  ,  where  P  AA  P  ■  LL  ,  is  Che  same  as  che 

T  R 

non-zero  scruccure  of  R  when  we  compuce  QDA  P  *  [q]  . 

This  ordering  requires  finding  che  non-zero  scruccure  (noc  che  values) 

7 

of  AA  ,  chac  is  ics  "adjacency  matrix,"  and  Chen  application  of  che 
minimum  degree  algorichm  co  chis  scruccure.  We  used  che  daca  scruccures 
and  procedures  in  [4]  wichouc  oodificacion  and  accained  very  satisfactory 
results.  The  only  exception  co  use  of  chis  ordering  was  for  problem 
SHARE23,  which  cums  ouc  co  have  a  nacural  "block  angular"  ordering  with 
dense  diagonal  blocks  above  che  linking  rows.  This  scruccure  is  almost 
ideal,  and  is  used  as  ic  scands. 

The  non-crivial  models  in  Table  5  were  run  again  with  the  three 
enhancements  in  Seccions  6-8,  and  che  resulcs  are  displayed  in  Table  6. 

The  laprovemenc  is  substantial,  and  che  Claes  for  chis  subsec  of  models 
begins  to  bear  comparison  wich  che  MPSIII  times. 
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9.  Oat  of  Givens  Rotations 


Aeetapeiag  to  build  on  this  aucetas  and  solve  model  E226  oat  immediate 

difficulty.  For  largtr  problems,  trying  to  parfora  the  QR  dtcompoaiclon 

using  Houstholdtr  transformations  becomes  Impractical  as  tha  number  of 

Intermediate  and  final  non-zeros  becomes  very  large,  as  Illustrated  by 

Heath  (8].  The  method  of  George  and  Heath  [3]  overcomes  this  by  never 

T 

storing  Q  and  using  Givens  rotations  to  eliminate  rows  of  3  ,  building  R 
with  a  row-wise  data  structure. 

Osing  the  George  and  Heath  technique  we  are  able  to  make  use  of  the 
full  power  of  the  minimum  degree  ordering  and  symbolic  factorization  pro¬ 
cedures  described  by  George  and  Liu  {4].  The  adjacency  matrix  is  con- 

T 

struct ad  as  before,  and  the  minimum  degree  ordering  (of  the  columns  of  A  ) 
is  found.  This  is  followed  by  a  symbolic  factorization  which  predicts 
where  all  the  non-zeros  in  R  will  be  (assuming  no  cancellation).  The  data 
structure  for  this  (mildly  pessimistic)  estimate  of  the  sparsity  pattern  of 
R  is  set  up  once-and-for-all,  and  used  in  all  subsequent  factorizations 
(at  least  in  this  implementation),  ftote  chat  since  Q  is  discarded,  we 
update  the  Dc  vector  simultaneously  with  the  factorization,  computing: 

q[daTP  ;  De  i  e  ;  Dc]  -  rj  i  QDc]  .  (26) 

t  »  »  v  r 

(Jslng  this  approach,  we  were  able  to  run  all  the  models  in  Table  I, 

with  results  as  shown  in  Table  7  (omitting  Che  first  two  models). 

There  are  some  differences  in  the  data  displayed.  Since  we  work  only 

with  R,  obtaining  c  directly  as  in  (15b),  the  estimate  of  <c(R),  not 

P 
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its  square ,  is  given.  There  is  cow  only  one  masher  for  a  non-zeros,  as 
computed  in  the  symbolic  factorization  of  AA  •  (This  does  not  include 
diagonal  non-zeros  of  a  or  chose  in  the  last  two  columns . )  The  size  of 
the  step  length  a  actually  attained  is  also  of  interest  (note  that  a  « 
0.99  is  used  here). 

10.  Discussion  of  Computational  Results 

It  is  clear  that  the  Givens-George-Heach  approach  has  again  resulted 
in  substantial  improvement,  and  enabled  the  solution  of  the  larger  models 
in  tolerable  —  if  not  competitive  —  time. 

The  very  bad  performance  on  problem  ISRAEL  is  due  to  the  presence  of 
three  very  dense  columns  (as  well  as  several  dense  rows)  in  Che  LP  matrix. 
This  difficulty  could  be  partly  overcome  by  using  an  update  to  introduce 
these  columns  — 1  chat  is  including  them  as  "nasty"  columns  along  with  b 
and  d.  The  present  implementation  does  not  allow  this.  However  if  these 
dense  columns  are  omitted  from  the  matrix  the  number  of  non-zeros  in 
(symbolic)  R  drops  to  4325,  indicating  that  a  two-  or  three-fold  improve¬ 
ment  may  be  possible.  However,  there  oust  clearly  be  a  point  at  which  the 
extra  work  in  the  updating  approach  begins  to  outweigh  the  savings  in 
sparsity  of  R.  Note  that  the  simplex  method  had  so  particular  difficulty 
with  this  problem. 

The  number  of  iterations  appears  to  grow  quite  slowly  with  model 
size.  Karmarkar  (ill  has  claimed  that  Che  number  of  iterations  should  be 
0(log  n).  While  we  would  not  make  so  formal  a  claim,  these  results  by  no 
means  contradict  it. 
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An  encouraging  sign  is  the  relative  sparseness  of  R  (escape  for 
problea  ISRAEL).  Tha  number  of  non-seroa  is  generally  only  2  or  3  time* 
ehe  number  of  non-xeros  in  the  factors  of  ehe  opeiael  basis.  Given  the 
vulnerability  of  QR  decomposition  to  fill-in,  this  speaks  well  of  the 
mini, mum  degree  ordering.  Since  tha  method  behaved  well  in  the  case  of 
block  angular  A  (as  should  be  expeccad)  it  suggests  that  nested  dissec¬ 
tion  [4],  or  incomplete  nested  dissection  might  also  be  worth  investigat¬ 
ing. 

The  relative  sparsity  of  tha  resulting  R  oust  not,  however,  be  con¬ 
fused  with  the  amount  of  work  required  to  obtain  it  via  QR  decomposition 

X 

of  B  .  Theoretical  considerations,  and  our  numerical  results,  indicate 
that  this  can  grow  drastically  with  n,  even  using  a  close  to  ‘’state  of  the 
art”  direct  method  (see  [3]  for  later  developments). 

11.  Finding  "Exact"  Optimum  Data 

In  practice,  one  mist  ask  how  a  usable  optimum  basic  feasible  solution 
to  the  LP  (and  its  dual)  should  be  extracted  when  the  projective  method  is 
deemed  to  have  converged.  One  might  hope  to  use  the  projective  method  to 
“front-end”  the  simplex  method.  As  it  happens,  many  MP  systems  contain  a 
procedure  quite  suitable  in  principle  for  this  task.  In  all  descendents  of 
MPS/360  (Including  MPSIII  and  MPSX/370)  this  is  known  as  the  BASIC  proced¬ 
ure.  Benichou  et  al.  give  a  general  description  of  this  method  in  section 
2.7  of  [1]  (see  also  (9,12]).  Essentially,  BASIC  accepts  a  non-beslc  solu¬ 
tion  to  a  LP  and  transforms  it  into  a  basic  solution  whose  objective  (phase 
I  or  II)  value  is  at  least  as  good.  This  may  be  followed  by  the  simplex 
method  to  attain  optimality. 
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Experiments  with  th«  first  (fixed  a,  Householder)  version  of  the  code 
indicated  that  the  projective  method  could  provide  a  reasonably  good  start¬ 
ing  solution,  via  BASIC,  even  with  loose  phase  II  convergence  criteria. 

Thus  Table  8  gives  data  on  the  BASIC  and  simplex  steps  required  to  optimize 
SHAJQE2S  when  the  projective  method  is  stopped  early  and  at  termination.  It 
is  noticeable  chat  feasibility  in  the  projective  method  deteriorates  toward 
the  optimum,  and  early  termination  provides  a  satisfactory  starting  solu¬ 
tion  for  BASIC.  For  larger  problems,  and  with  a  longer  step  length  (fewer 
iterations)  this  deterioration  is  more  marked,  and  it  proved  difficult  to 
get  good  starting  solutions  for  BASIC.  Indeed  BASIC  and  subsequent  simplex 
iterations  were  sometimes  at  many  as  chose  required  to  solve  the  problem 
without  the  projective  method  as  a  "front-end."  This  suggests  that  an 
additional  Iteration  (or  more)  should  be  carried  out  at  "convergence"  to 
enforce  feasibility,  or  that  perhaps  some  "composite"  steps  should  be  per¬ 
formed,  which  attempt  to  restore  feasibility  as  the  optimum  is  approached. 
This  has  not  been  attempted  in  this  series  of  experiments. 

12.  Conclusion 

Following  an  experimental  trail  beginning  with  [10],  we  have  been  able 

to  implement  a  version  of  Che  projective  method  which  solves  realistic  LP 

models  in  respectable,  in  not  really  competitive,  times.  This  is  not  to 

say  chat  further  improvements  are  not  possible.  Karmarkar  has  hinted  [11] 

that  Bell  Labe  are  experimenting  with  the  use  of  incomplete  Choleskl 

X 

factorization  of  3B  to  precondition  a  conjugate  gradient  method  [7]  for 
solving  the  least  squares  problems  at  the  core  of  this  method.  This 
promises  to  lead  to  soma  improvement  over  the  direct  QR  methods  to  which 
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these  experiments  have  bMn  confined.  An  even  more  promising  approach  la 
soma  preconditioned  form  of  the  L5QR  algorithm  [14J. 

Evan  chough  more  sophisticated  schemes  for  solving  sparse  laasc 
squares  probleas  may  lead  to  markedly  improved  performance,  it  la  difficult 
to  sea  hov  they  can  lead  to  SO  to  1  improvements  over  eha  simplex  method 
except  in  very  special  cases.  As  it  happens,  ehe  modal  for  which  this 
claim  was  made  is  reported  to  be  a  multicommodity  flow  problem  [11],  which 
is  block-angular  and  has  an  otherwise  remarkably  "friendly"  matrix. 

Experience  shows  that  it  is  noc  difficult  to  beat  MPSX/370  (the  chosen 
yardstick)  by  a  wide  margin  on  embedded  network  models  (of  which  the  multi¬ 
commodity  problem  is  a  special  case).  We  have  reported  results  [13]  for  a 
model  with  about  5000  network  rows  and  700  "hard"  rows  where  exploitation 
of  the  network  simplex  method  in  MPSXII  enabled  us  to  solve  it  in  about  S 
minutes  of  IBM  3033  time,  whan  MPSX/370  —  used  naively  —  had  failed  to 
terminate  in  over  an  hour.  Other  practitioners  were  able  to  tie  a  differ¬ 
ent  network  optimizer  in  with  MPSX/370  and  solve  the  model  in  about  11 
minutes. 

Finally,  our  observations  may  be  summarized  as  follows: 

(1)  The  number  of  iterations  required  by  the  projective  method  does 
indeed  grow  slowly  with  modal  size. 

(2)  The  work  per  iteration  is  substantial,  and  limited  by  the  technol¬ 
ogy  for  solving  sparse  least  squares  problems  [3]. 

(3)  Fast  improvements  in  speed  over  the  simplex  method  only  seem  pos¬ 
sible  for  special  classes  of  problems,  and  ehese  may  be  problems 
for  which  greatly  improved  simplex  technology  is  already  avail¬ 
able. 
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Haas 
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Notx-Zsros 

Op  datum 

c r 

3 

3 

4 

15 

11.0 

16.0 

FEEDMIX 

4 

3 

3 

16 

5.2857 

16.0 

GNET20 

20 

0 

44 
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760.179 

162 

STD23 

22 

12 

23 

103 

-45.53024 

163 

AfIRO 

27 

19 

32 

110 

-494.75293 

163 

ADLITTLE 

56 

41 

97 
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225494.94 

163 

SHARE2B 

96 

83 

79 

801 

-415.73224 

163 

ISRAEL 

174 

174 

142 

2274 

896644.8 

164 

BRANDY 

220 

54 

249 

2256 

1518.51 

164 

E226 

223 

190 

282 

2876 

-18.7519 

163 

BANDM 

305 

0 

472 

2612 

158.628 

164 

Table  l.  Tut  Pro  bleu  Statistics 
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Muse 

"Crashed'*  Cols. 

Simplex  Its. 

Eta  Mon-Zeros 

Time  (Secs.) 

2 

1 

6 

0.58 

FEEDMIX 

1 

3 

14 

0.59 

GNET20 

19 

20 

69 

0.72 

STD23 

10 

6 

72 

0.52 

AFIRO 

13 

4 

52 

0.53 

ADLITTLE 

13 

80 

266 

1.02 

SHARE2B 

13 

84 

558 

1.06 

ISRAEL 

1 

166 

1480 

2.35 

BRANDT 

83 

168 

1467 

2.66 

E226 

50 

350 

1490 

3.95 

3ANDM 

114 

253 

2385 

3.51 

Table  2.  MPSIIX/WHT2A2D  Solution  Results 
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Iterations  to  Optimality  j 

a 

STD23 

AFISO 

163 

239 

82 

118 

j  1 

44 

64 

Table  3.  Sensitivity  to  (Fixed)  a 
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ENCLOSURE  D 

"ON  STEEPEST  ASCENT  WITH  EQUALITY  CONSTRAINTS" 

J.  M.  Danskin 
incomplete  paper,  1986 


ON  STEEPEST  ASCENT  WITH  EQUALITY  CONSTRAINTS 

by 

John  M.  Danskin  (Arlington  Virginia^ 


ABSTRACT 

The  paper  gives  an  algorithm  for  finding  steepest 
ascent  of  a  linear  form  in  RP  .  subject  to  q  constraints  . 
in  at  most  q  -  l  s*:eps.  It  requires  ‘he  storage  of  a  qxq 

There  is  an  application  to  linear  programming. 


dense  matrix. 


I 


§  l •  Int  roduction 


Our  problem  is  to  maximize  2'  y  subject  to  the 


constraints 


I  V  I  --  t 


(i.  n 


and 


J 


V  = 


0  . 


J  €  J  . 


(1.2) 


He  re  j  ,  y  .  and  the 
nonempty  set  of  indices, 
satisfying  M.l)-(1.2). 


are  in  R*3  ,  2  /  Q  ,  and  J  is  a  rinite 
We  denote  by  7  the  set  of  v  - 
We  will  also  call  such  a  /  "admi  ss'ble". 


Our  method  in  the  general  case  is  -o  set  up  a  'inear 
manitold  Z  m  R*3  and  then  to  tind  the  point  c  -  Z  closes-  to 
the  origin. 


We  deal  in  i  2  with  a  -rivial  exceptional  case.  In 
j  3  we  construct  the  linear  manitold  Z  and  expla;n  the  significance 
of  c.  In  §  4  we  show  huw  to  tind  c  in  tmitely  many  s‘eps.  In 


I 


2 


|  5  we  give  the  easy  estimate  for  computation  time.  In  $5-8 
we  note  an  application  to  linear  programming.  In  §  10  we 
recall  Leinke's  closest-point  principle  [3]  and  the  connection 
of  our  method  with  it. 


§  2.  The  exceptional  case 

We  denote  by  the  set  of  j  -  J  for  which  2  •  i  ■  T  0 
and  write  J  =  J  -  J®  .  The  exceptional  case  is  the  case  J®  -  J  : 
all  the  b  .  are  orthogonal  to  2  .  In  that  case  the  vector 

V  =  2  '\  2  I  (2.  1) 

satisfies  *:he  conditions  (l.l)  and  (1.2).  and  2  •  v  -  *-|  • 

For  any  unit  y 1  £  whateve r,  we  have  2  •  v  =  J  2  |  /  •  v '  <  1 1  j  . 

with  equality  holding  only  if  y'  =  v  .  Hence  <  uniquely  solves  *:he 
problem  (I.  1) - ( l.  2),  and  the  maxi  mum  J  s  \  =  |  2  |  . 
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5  5  •  The  gene  rat  ca» 


Now 


rA 


is  not  empty.  Suppose  tirst  that:  also  J  >.3 


not  empty.  The  vectors  b-  tor  j  j  J  now  generate  a  cone 

in  R^  .  Using  the  G  ram -Schmidt  o  rthog ona li zati on  process, 
we  construct  an  orthogonal  basis  f  e^]  ^  K  tor  '  ^>ut 


h  i 

~  J 


k  -  K 


ii  .  .  2,  l  e,  /  e, 
j  k  k  k 


€  J1 


(3.  1) 


Put 


X'  : 


%  i 

(  2  •  S  j! 


i  -  J 


We  will  suppose  trom  now  on  tha*-  has  q  elements, 
by  3  ‘he  hyperplane  in  R^  given  by 


(3.  21 


Denote 


z .  =  l 

J 


'  3.  3 


j  €  J 


The  linear  manifold  /  is  ‘he  image  of  3  :n  R^5  given  by  the 

mapping 


2 

J  Si' 


X 


Z  ,  X' 

J 


z  -■  3 


1  3.  4) 


4 


We  note  three  cardinal  properties  of  £  .  First 


2  •  x  -  1  for  a 1 1  x  i  £  • 


This  follows  from  !3.2)  and  (3.4).  Second, 


X  •  /  '  =  0  to  raH  x  c  £  and  ■/'  c 


It  suffices  ‘o  prove  that  k 


0  tor  ‘he  i\  ot  (3.  1)  and 
J 


any  admissible  v'  .  But  this  ‘ollow.s  because  any  admissible 
v'  is  by  definition  normal  to  the  j.  tor  i  -  Jl  ,  which  takes 
ca  re  ot  the  ti  rs*  ‘err  ,  n  ‘be  ght  s  ,de  of  i  3.  1 1 .  As  ‘o  the 

second  it  is  bv  definition  norma'  ‘o  ‘he  k  tor  j  - 

'  j 

hence  normal  to  3  .  and  hen--  no -u  a',  ‘o  ea-h  element 

e^  ot  the  orthogonal  bas's.  The  ‘hi -d  p-ope-‘y  is  ‘hat 

x  .  \  =  0  to  r  a'_l_  x  £  *_rd  ■  j  J3  . 

This  is  so  because  all  the  k '  ot  3.1'  are  or‘005  inal  to 

Now  let  c  be  the  point  on  £  noses'  to  the  origin. 

£  has  relative  to  c  the  following  fourth,  obvious  p-opertv: 


c  •  (x  -  c)  =  0  tor  all  x  -  £ 


(3.  5) 


(3.  bi 


3. 


The  n 


(  3.  ?' 


5 


Now  put 

.  .  2 

h  =  2  -  c/c 


(3.  oi 


It  may  be  that  h  =  0  .  Then,  since  c  c  ■£  satisfies  (3.  6> , 
2  •  y'  =  0  for  all  admissible  y'  and  there  is  no  direction  ot 
constrained  ascent  at  ail. 


Otherwise  h  ji  0  .  Then  put 

\  -  |  h  |  .  y  =  h/>.  .  (3.  10) 

Then  (l.  1)  is  satisfied.  As  ‘o  the  constraints  in  (1,2)  with  j  i  , 

3*  ij  =  0  by  the  detinition  of  ,  and  b  .  c  =  0  by  (  3.  7)  , 

As  -o  the  constraints  in  (l.  2)  with  j  ~  J  .  wp  have 

x*’»h=3c^«j-x^*c/c  =  1-1  =  0  i  3.  11) 

by  (3.5)  and  (3.8)  .  so  that,  by  the  definition  '3.2)  , 

♦ 

t;.h  =  0  .  j  €  J1  .  (3.121 

But  since  the  in  (3.1)  are  orthogonal  to  both  2  and  c  ,  (3.  12) 


f 
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implies  that 


b  .  •  h  =  0  .  j  €  J1  .  (3. 13) 

Hence  all  the  conditions  (1.2)  are  satisfied,  and  y  €  7  • 

Now  suppose  y'  is  any  admissible  vector.  Then  c  •  /'  =  0 
by  ( 3.  6) ,  so  that 

J  I  V1  :  (3  •  C^C^I*  y'  !  i  /  >■/'  .  (3.14) 

Hence  v  yields  *he  unique  maximum  over  7  .  and  that  max  mum 
is  \  .  - 

The  case  in  which  J ^  13  empty  is  somewhat  simpler. 

The  subtracted  projections  in  (3.1)  are  now  gone.  V.  =  tor  all 
j  ;  j'  ;  J,  and  the  resf  of  the  cons* ruction  goes  ‘h-ough  as  befo_e. 

We  now  show  how  ‘o  tind  c .  by  an  iteration  with  at 


most  Min  [  p,  q  -  l } 


moves. 
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§  4.  Location_ot_the  closes*:  point  c 


In  what  follows  we  suppose  that  the  number  q  of 
elements  in  J1  is  at  least  3,  the  cases  q  =  l,2  being  obviously 
trivial.  And  we  assume  that  p  >  2  . 

The  first  basic  element  in  our  algorithm  is  steepest 
descent  in  3  .  We  recall  what  this  is.  For  z  <E  3  Pu<: 

P 

Q(z)  =  )  r  /  z .  x^  1  ^  .  (4.  1) 

—  — *  1  J  1 

i  =  i  j?r 

This  is  ‘he  distance  -  squared  trom  the  point  x  given  by  (3.4' 
to  the  origin  in  .  Q  has  the  partials 


a  Q(  z) 
3  z 


2x  •  ,  j  -  -l'  . 


Form  the  direction  numbers 


gz  =  x 


.  (  x  -  xJ  )  ,  j  :  J1  , 


1 4.  21 


(4.  3' 


in  which  x  is  the  "barycenter" 
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x  = 


q 


) 


j  € 


(4.  41 


It  the  gz^  are  all  zero,  x  is  already  the  closest  point  c  . 
Otherwise  the  direction  number  vector  gz  indicates  the  direction 
of  steepest  descent  in  3  •  and  the  distance -aqua  red  is  in  tact 
strictly  decreasing  in  that  direction. 


The  second  basic  element  in  oar  algorithm  is  based 
on  a  simple  geometric  observation,  as  follows.  Suppose  that 
(,  is  any  line  lying  in  ^  .  Then  the  point  x  closest  to  the  origin 

on  t.  is  also  the  point  closest  to  c  on  that  line.  Hence  both  the 
origin  and  the  closest  point  c  lie  in  the  plane  ft  normal  to  K 
at  x  .  With  this  in  mind,  we  will  in  oar  algorithm  restrict 
the  subsequent  search  to  ~  . 


The  algorithm  can  be  sfarted  anywhere.  We  i‘art  it 

at  the  point  z^  ~ 3  with  z?  =  l/q  .  j  c  J1  ,  The  image  ot  z^ 

under  (3.41  is  x  .  If  gz  =  0  ,  x  =  c  and  we  are  through. 

Otherwise  denote  by  gx  the  image  of  gz  under  (3.4)  .  Since 

gz  corresponds  fo  a  direction  of  strict  descent,  then  also  gx  4  3  . 

0 

We  normalize  gx  to  a  unit  direction  vector  gx  .  Now  p-ns  a 

l  _  3  1 

line  K,  through  x  in  the  direction  gx  .  Let  x^  be  the  point 

*)  The  4  sign  distinguishes  ‘his  point  from  the  "vertex" 
x*  defined  at  (3. 2)  . 


“1^ 
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on  K  ^  closest  to  the  origin.  Denote  by  ..  the  plane  normal  to 
(and  -he  re  f  o  reto  gx  '  at  x  d  .  Fro  m  the  cons.  rue  t  i  o  n ,  x 
has  a  natural  inverse  in  9  under  (3.4)  .  We  denote  that  inverse 

t 

by  z  ‘  . 


Now  suppose  -hat  l  <  n  <  Min  f  p»  3  *  1}  .  and  that  we 
have  arrived  at  a  point  zn  ~  0  and  corresponding  point  xfn  £ 
under  (3.4)  .  Suppose  that  we  have  constructed  directions 

gx^ . gxn  ‘  ,  each  normal  to  all  those  preceding  it,  and 

o  n  o  1-1 

planes  "  ,...,n  .  normal  respectively  to  the  gx  ,  .  . .  ,  gx  , 

and  all  containing  c  .  Finally,  we  suppose  that  x#  lies  In 

all  those  planes. 


Now  -onstruct  the  direction  numbers  gz.  ,  j  ;  J  . 

j 

from  (4,  3)  with  x  replaced  by  x^n  .  If  these  are  all  zero, 
x#n  =  c  and  we  are  finished.  Otherwise,  gz  corresponds  -o 
a  direction  of  strict  decrease.  Therefore  its  Image  gx  under 
(3.  4)  is  also  nonzero,  and  corresponds  Lo  a  lirection  if 
strict  decrease  on  £  .  Now  we  need  to  pr-r/e  a  -echnica!  point. 


Denote  by  S  the  space  spanned  by  the  vec-ors 
Introduce  into  S  a  coordinate  system,  with  origin  y 

wi*h  axes  ;Tl,...,;Yn  parallel  to  the  gx° . gz'1''" 

Jet  jj  be  any  unit  vector  in  S  with  components  u . 

Consider  the  ray  t  issuing  from  x^n 


0 

gx  . gx 

at  x“fn  and 

Now 

v  along 
n 

in  the 


u  - 


those  axes. 
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direction  of  u  .  with  s  measuring  the  distance  from  xtn  . 

Now  *he  vectors  gx^ , ....  gx"*1  are  orthogonal  not  only  to 
one  another  but  also  to  the  vectors  x#n  -  c  and  c,  which  a-° 
themselves  orthogonal.  Hence  the  distance -squa-ed  to  the 
origin  from  the  point  on  i  corresponding  to  s  is 

D^(s)=  c2  +  (x^n  -  c)  ^  f  «“  [  £  *  +  .  .  .  +  j.^1 

=  (xin)2  +  s2  .  (4 

Hence  the  directional  derivative  of  the  distance  -squared  in  the 
direction  ju  ,  at  x*n  ,  is  aero.  This  is  the  technical  point 
we  needed. 

Since  the  vector  gx  defined  above  is  nonzero  and 
corresponds  to  a  direction  of  st rict  dec  rease  for  the  distance- 
squared  .  it  does  not  lie  in  S  ,  We  now  put 

0  0  i-i  i-i 

(gx)'  s  gx  -  (gx.gx  )  gx  -  ...  -  fgx.gx  ')  gx  ‘  .  (4 

Since  gx  does  not  lie  in  S,  (gx!'  i  0  . 

We  now  normalize  (gx)'  to  a  unit  direction  vector 
gxn  and  pass  a  line  £n  f  *  through  x^n  in  the  direction  gx11  , 


II 


Denote  by  x^n  ^  the  point  on  t  closest  to  the  origin  . 

Denote  by  ~n+l  the  plane  normal  to  /  n+1  (and  therefore  to  gxal 
at  xinH  .  Let  zn+l  ££  be  the  "natural  inverse"  of  x^ 

We  see  that  all  the  induction  hypotheses  made  a1-  the 
beginning  of  this  general  step  with  n^  1  are  fulfilled.  We 

assumed  that  all  the  planes  .1 . .  contained  c  .  Now  ~ 

does  as  well,  by  its  construction  .  And,  since  x$  lay  in 

all  the  . “n  and  the  direction  gx"  of  (,n^  is  orthogonal 

to  all  the  normals  gx  ,  .  .  .  ,  gx'  to  those  planes,  then  also 

x*n  *  lies  in  all  the  . “n  .  And  it  lies  in  "n  *  by  the 

definition  of  ~n^  .  So  we  are  ready  to  proceed  to  the  nex*' 
step. 


The  dimension  of  £  cannot  exceed  q-l  because  there 
21  o  -l  1 

we. -e  only  q  - 1  vectors ,  x~-x  . xi  -  x  .  in  the  original 

basis.  And  it  cannot  exceed  p  ,  because  £  is  imbedded  in  R*3  . 
Hence  put 

r  -  Min  i  o,  q-l  j  .  1 4. 

The  dimension  of  £  then  does  not  exceed  r  .  Since  we  add  one 

0  » 

element  to  the  orthonormal  basis  [  gx  ,  .  .  .  }  on  each  move,  the 
number  of  moves  cannot  exceed  r  ,  and  the  closest  point  c  will 
be  reached  exactly  on  the  last  move.  □ 


\ 


